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Abstract —Locally repairable codes (LRCs) are ingeniously 
designed distributed storage codes with a (usually small) fixed 
set of helper nodes participating in repair. Since most existing 
LRCs assume exact repair and allow full exchange of the stored 
data id = a) from the helper nodes, they can be viewed as a 
generalization of the traditional erasure codes (ECs) with a much 
desired feature of local repairability via predetermined sets of 
helpers. However, it also means that they lack the featnres of (i) 
functional repair, and (ii) partial information-exchange (d < a) in 
the original regenerating codes (RCs), which could further reduce 
the repair bandwidth. Motivated hy the significant bandwidth 
reduction of RCs over ECs, existing works by Ahmad et al and by 
Hollmann studied the concept of “locally repairable regenerating 
codes (LRRCs)” that successfully combine functional repair and 
partial information exchange of regenerating codes with the 
much-desired local repairability feature of LRC. The resulting 
LRRCs demonstrate significant bandwidth reduction. 

One Important issue that needs to be addressed by any 
local repair schemes (Including both LRCs and LRRCs) is 
that sometimes designated helper nodes may be temporarily 
unavailable, the result of multiple failures, degraded reads, or 
other network dynamics. Under the setting of LRRCs with 
temporary node unavailability, this work studies the impact of 
different helper selection methods. It proves that with node 
unavailability, all existing methods of helper selection, including 
those used in RCs and LRCs, can be insufficient in terms of 
achieving the optimal repair-bandwidth. Eor some scenarios, it 
is necessary to combine LRRCs with a new helper selection 
method, termed dynamic helper selection, to achieve optimal 
repair bandwidth. This work also compares the performance 
of different helper selection methods and answers the following 
fundamental question: whether one method of helper selection is 
intrinsically better than the other? for various different scenarios. 


I. Introduction 

Erasure coding (EC) is efficient in terms of reliability versus 
redundancy tradeoff in distributed storage systems. An (n, k) 
MDS code, when applied over a network of n storage nodes, 
can tolerate n—k simultaneous failures. When a node fails, it is 
repaired by accessing any k surviving nodes, downloading all 
the coded data stored in these k nodes, and then reconstructing 
the original data. As a result, we say that the repair of EC 
involves “full information exchange” and “exact repair”. 
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Regenerating codes (RCs) Q, on the other hand, were 
proposed to decrease the amount of communication required 
during repair, oftentimes termed the repair-bandwidth. The 
three key ideas that allow regenerating codes to decrease 
repair-bandwidth are: (i) contact as many nodes as possible 
during repair or in other words d = n — 1 nodes, termed 
helper nodes, (ii) download only a partial fraction of the data 
(/3 < a) as opposed to full information-exchange (/3 = a), 
and (iii) allow functional repair which is a generalization to 
exact repair in ECs. These 3 ideas enable significant repair- 
bandwidth reduction of RCs over ECs Q. 

Another type of distributed storage codes is the locally 
repairable codes (LRCs) i), HU, 112, III, US, El that 
use a small number of helper nodes d during repair, which is 
in contrast with RCs that were originally designed for large d 
(i.e., d > k). A closer look at the properties of LRCs shows 
that LRCs resemble ECs in that they operate with a = S and 
under exact repair. The main differenc^H is that ECs access 
d = k helpers while LRCs use much smaller d values (usually 
^ k). Eor that reason, LRCs can be viewed as a generalization 
of ECs with a much desired feature of local repairability 
(small d). Inspired by the repair-bandwidth reduction of RCs 
over ECs, it is thus natural to ask the question: whether it 
is possible to design locally repairable regenerating codes 
(LRRCs), that simultaneously admit all three features: local 
repairability {d < k), partial information-exchange (/3 < a), 
and functional repair? 

It is worth noting that for any locally repairable code (small 
d) how to select the d helpers (out of the remaining n — 1 
nodes) is a critical part of the underlying code design and could 
have significant impact on its performance. Therefore, any 
attempt on designing LRRCs that combine local repairability 
(d < k), partial information-exchange, and functional repair 
must address the challenges of how to design the underlying 
helper selection policy. 

It turns out that the original work Q that proposed RCs does 
consider the possibility of LRRCs since it derives the storage- 
repair-bandwidth tradeoff curves for arbitrary d < n — 1. 
However, a close look at the derivation in Q shows that m 
assumes that the LRRC “blindly chooses the d helpers” and 
then characterizes the corresponding worst-case performance 

^Another subtle difference is that for ECs, a newcomer can access any set 
of d helpers while for LRCs each newcomer can only access a predetermined 
set of d helpers. As will be rigorously defined in Section Hl-BI the former is 
termed the blind helper selection and the latter is called the stationary helper 
selection. 
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TABLE I 

The comparison table between existing codes and LRRCs of this work. 



Repair Mode 

Info-Exchange 

Helper Selection 

Temporary Node Unavailability 

Target 

Parameters 

ECs 

Exact 

Eull 

Blind 

No Treatment Needed 

d = k 

Original RCs 

Functional 

Partial 

Blind 

No Treatment Needed 

d> k 

Exact Repair RCs 

Exact 

Partial 

Blind 

No Treatment Needed 

d> k 

LRCs 

Exact 

Eull 

Intelligent 

Need Special Treatment 

d < k 

LRRCs of this work 

Functional 

Partial 

Intelligent 

Need Special Treatment 

All Parameters 


under such a blind helper selection (BHS). In some sense, 
G] has, implicitly, analyzed LRRCs under BHS, the most 
pessimistic helper selection scheme. It was not clear from 
the results in Q whether the performance of LRRCs can be 
further improved if some sophisticated helper selection scheme 
other than BHS is used. 

In contrast, existing works in a-Ei and ifTOll are the first 
works to study LRRCs when intelligent (non-blind) helper 
selection policies are used. For the special cases of fc = n — 1 
and a = d(3 or a = fi, Qol proves a lower bound on the 
repair bandwidth (BW) of any LRRCs regardless whether an 
intelligent or a BHS scheme is used. The lower bound turns 
out to be tight and achievable by some modified LRC scheme 
in HI for certain (n, k,d) combinations. 

At the same time, E-El answer the following related 
question; under what (n, k, d) values can intelligent helper 
selection strictly improve the performance of LRRCs when 
compared to the BHS-based LRRC in ||7|. This question was 
fully answered for any arbitrary {n,k,d) parameters. A new 
scheme termed the family helper selection (FHS) scheme was 
also devised in E-0 that demonstrates superior performance 
(very small repair BW) while admitting local repairability. The 
FHS scheme proposed in E-0 is provably optimal (i.e., 
attains the upper bounds in 0-0 and ITOl ) for a much wider 
rang^l of {n,k,d) values than the modified LRC scheme in 

M- 

Table m summarizes the differences among ECs, RCs, 
LRCs, and LRRCs in terms of repair modes, the amount 
of information exchange, the corresponding helper selection 
schemes, and the target parameter values. The blue-shaded 
blocks correspond to the ideas that are known to be able 
to reduce the repair BW. Note that only LRCs and LRRCs 
employ intelligent helper selection rules while both ECs and 
RCs employ blind helper selection. 

Despite the preliminary promising results, the LRRCs con¬ 
sidered in 0-0 and JH do not consider the following 
practical issue; Because of multiple failures or degraded reads 
or other network dynamics, some designated helper nodes 
may be temporarily unavailable. Therefore, for any locally 
repairable scheme to work in practice, including both LRCs 
and LRRCs, it needs to have an alternative set of helpers in 
case of node unavailability. See the column titled “temporary 
node unavailability” in Table |T] Eor LRCs, temporary node 
unavailability has been studied in ifTsl , lITSl , ifTTl . In this work, 
we study the performance of LRRCs 0-0 and GqI under 

^The scheme in na can be viewed as a special example of the FHS scheme 
and its variant in (3 a 


different helper selection policies while taking into account 
the issue of temporary node unavailability. 

Our studies are centered around three different classes 
of helper selection schemes, (i) the BHS schemes; (ii) the 
stationary helper selection (SHS) schemes; and (iii) a new 
class of schemes proposed in this work, called dynamic 
helper selection (DHS). These three classes of helper selection 
schemes will be formally defined in Section III-BI As will 
be explained in details in Section III-CI the helper selection 
policies of all existing designs of ECs, RCs, LRCs, and LRRCs 
/El?, /O?, nh, ES are either BHS or SHS. 

The main contributions of this work are summarized as 
follows. 

Contribution 1: We prove, for the first time in the literature, 
that both BHS and SHS can be insufficient in terms of achiev¬ 
ing the optimal repair-bandwidth. Specifically, we provide an 
example with r = 1 showing that it is necessary to use DHS, 
which is designed based on a completely different principle, 
to achieve optimal repair-bandwidth while the performance 
of BHS and any SHS are strictly suboptimal. Eurthermore, 
the DHS scheme in our example is simultaneously minimum 
bandwidth regenerating (MBR) and minimum storage regener¬ 
ating (MSR), attaining a new storage-BW tradeoff point that 
was previously believed to be not possible except for some 
trivial degenerate cases. Such an example demonstrates the 
benefit of DHS and calls for further research participation on 
DHS designs. 

Contribution 2: Being a blind scheme, BHS is the least 
powerful of the three helper selection policies and can thus 
be used as a baseline. We study the following fundamental 
question; Given any (n, k, d, r) value, whether we can design 
an SHS or DHS scheme that strictly outperforms BHS? 
Surprisingly, for many {n,k,d,r) values the answer is no. 
That is, for those (n, k, d, r) values, even the best SHS or DHS 
scheme is no better than the simple BHS solution used in the 
original RCs 0. We call those (n, k, d, r) values as being 
indifferent to helper selection since the performance does not 
depend on what type of helper selection schemes being used. 

Knowing whether a given {n,k,d,r) value is indifferent 
to helper selection is of significant practical value since a 
distributed storage code designer can then decide whether to 
simply use the most basic BHS scheme (if the underlying 
(n, k, d, r) is indifferent to helper selection) or to invest time 
and effort to design more sophisticated helper selection rules 
to further improve the performance of the system. 

In this work, we prove that for a vast majority of (n, k, d, r) 
values, we can answer unambiguously whether it is indifferent 
to helper selection or not by checking some very simple 
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conditions. 

Summary: The main contribution of this work is mostly 
information-theoretic. The carefully constructed example 
sheds surprisingly new insights on the fundamental perfor¬ 
mance limits of different helper selection schemes in the 
context of RCs and LRRCs. The results in Contribution 2 
allows us to quickly check whether a given (n, fc, d, r) value is 
indifferent to helper selection or not, which provides valuable 
case-by-case guidelines whether it is beneficial to spend time 
designing new SHS or DHS schemes or whether one should 
simply use the simple BHS. 

The rest of this paper is organized as follows. Section [H] 
motivates the problem and introduces key definitions and 
notation. Section |In] describes information-flow graphs, the 
main tool we used for the analysis of LRRCs. Section |IV] 
presents the main results of this work. Section lYl presents 
proofs of the results of Contribution 1. Section[yT]presents the 
proofs of the results of Contribution 2. Section IVlIl concludes 
this work. 

11. Problem Statement 
A. The Parameters of A Distributed Storage Network 

This work follows the same distributed storage network 
model as introduced in the seminal work Q. For complete¬ 
ness, we provide in the following detailed definitions of some 
key parameters. Further explanations of the system model can 
be found in Q. 

Parameters n and k: We denote the total number of nodes 
in a storage network by n. For any 1 < fc < n — 1, we say 
that a code can satisfy the reconstruction requirement if any 
k nodes can be used to reconstruct the original data/file. For 
example, consider a network of 7 nodes. A (7,4) Flamming 
code can be used to protect the data. We say that the Hamming 
code can satisfy the reconstruction requirement for k = 6. 
Since any 6 nodes can construct the original file. By the same 
definition, the Hamming code can also satisfy the reconstruc¬ 
tion requirement for fc = 5 and fc = 4, but cannot satisfy the 
reconstruction requirement for fc = 3. The smallest fc of the 
(7,4) Hamming code is thus fc* = 4. In general, the value of 
fc is related to the desired protection level of the system while 
the value of fc* is related to actual protection level offered by 
the specific distributed storage code implementation. 

For example, suppose the design requirement is fc = 6, we 
can still opt for using the (7,4) Hamming code to provide the 
desired level of protection. However, using (7,4) Hamming 
code may be an overkill since a (7,4) Hamming code has 
fc* = 4 and it is possible to just use a single-parity bit to 
achieve fc = 6. 

Parameter d: We denote the number of nodes that a 
newcomer can access during repair by d. For example, m 
provides a detailed RC construction about how to achieve the 
design goal {n,k,d) = (10,7,9). Namely, each newcomer 
can access d = 9 helpers and any fc = 7 nodes can be used to 
reconstruct the original file. At the same time, Q also provides 
high-level guidelines how to use the RC to achieve the design 
goal when (n, fc, d) = (10, 7, 5). However, the RC can be an 
overkill in this scenario (n, fc, d) = (10, 7, 5) since any RC 


construction in Q that can achieve {n,k,d) = (10,7,5) can 
always achieve fc* = d = 5. As a result, even though the 
high-level design goal is to only protect against 10 — 7 = 3 
failures under the constraint of accessing only d = 5 helpers 
during repair, the RC in 13 cannot take advantage of this 
relatively loose protection-level requirement since it always 
has fc* < d = 5. 

Note that the above observation does not mean that the 
system designer should never use the RCs Q when the design 
goal is (n, fc, d) = (10,7,5). The reason is that these RCs 
with BHS have many other advantages that may be very 
appealing in practice, e.g., some very efficient algebraic code 
construction methods im, allowing repair with {n — d) simul¬ 
taneous failures, and admitting efficient collaborative repair 
when more than one node fails 1201 . The fact that fc* < d 
for any RCs in 13 simply means that when the requirement 
is (n, fc, d) = (10, 7, 5), the system designer should be aware 
that the RCs with BHS in Q cannot take full advantage of 
the relatively loose required protection level since we have in 
this scenario fc > d > fc*. 

In this work, we focus on the design target fc instead of the 
actual performance parameter fc*, since given the same fc, the 
actual fc* value may depend on how we implement the codes. 
For example, when locally repairable codes 13 are used, it is 
possible to design a system with fc = fc* > d. However, when 
RCs are used together with BHS, we always have fc* < d even 
though the target protection level may satisfy fc > d. For any 
given (n, fc, d) values, the goal of this paper is to compare the 
best performance of any possible helper selection scheme that 
can still satisfy the desired (n, fc, d) values regardless whether 
they offer over-protection (fc > fc*) or not. 

Parameter r: We denote the maximum number of nodes 
that can be temporarily unavailable at any given time by 
r. Specifically, if we denote the set of unavailable nodes 
by U, then we must have \U\ < r. If we also denote the 
failed node by F, the design goal is to repair node F when 
the nodes in U are unavailable. The unavailability of nodes 
in U may be due to degenerate reads, multiple failures, or 
underlying network dynamics. In this work we do not consider 
repair collaboration. That means, even when we have multiple 
failures, say both nodes i and j fail simultaneously, we repair 
each node separately. For example, we set F = i and U = {j} 
when repairing node i and we set F = j and U = {i} when 
repairing node j. Some repair cooperation schemes that jointly 
repair both nodes i and j can be found in Il20l . 

The range of the design criteria {n,k,d,r): Due to the 
nature of the distributed storage problem, we only consider 
(n, fc, d, r) values that satisfy 

2<n; l<fc<n — 1; I < d] and d < n — I — r. (1) 

In all the results in this work, we assume implicitly that the 
n, fc, and d values satisfy O- 

Parameters a, j3, and A4: The overall file size is denoted 
by A4. The storage size for each node is a, and during the 
repair process, the newcomer requests jd amount of traffic from 
each of the helpers. The total repair-bandwidth is thus 7 = d/3. 
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B. Types of Helper Selection Schemes 

DHS: We consider the most general form of helper selection 
in which the helper selection at current time t can depend on 
the time index t and the history of node failures and node 
unavailability from all the previous time slots 1 to (r — 1). 
We term this type of schemes the dynamic helper selection 
(DHS) scheme. Mathematically, for every time slot t (or 
equivalently for the r-th repair) the helper set decision at time 
T can be written in function form as {Uj}^^^) 

that returns the set of helpers the newcomer has to access 
at time t, where Fi and Uj are the failed node at time i 
and the set of unavailable nodes at time j, respectively. Since 
the helper selection function depends on the history of the 
failure/unavailability patterns and can change for each different 
time T, we term this scheme the dynamic helper selection 
(DHS) scheme, for which the term “dynamic” emphasizes the 
time and history dependence of the helper selection rules. 

SHS: A subset of the DHS schemes is the stationary helper 
selection (SHS) schemes that assign hxed helper sets of d 
nodes to each combination of a failed node and a set of 
unavailable nodes. Mathematically, the helper set decision at 
time T in SHS can be written in function form as D{Fr, Ur)- 
The function D{-, ■) does not change with respect to the value 
of T and the input arguments of D(-, •) aie Fr and Ur, instead 
of the entire history {Fi}J^-^ and The idea is that, 

for a given node failure and a given set of unavailable nodes, 
the same helper set is used at any time instant. We can see that 
this construction is stationary because the helper sets do not 
change with time and only depend on curi'ent time r failure 
and node unavailability information. 

All existing (non-blind) helper selection schemes can be 
interpreted as a form of SHS. For example, a popular way 
of helper selection when there are r temporarily unavailable 
nodes ED, 03, 03, on is as follows. Each node F 
is assigned a hxed set of (d + r) candidate helper nodes. 
When node F needs to be repaired, since at most r nodes 
are temporarily unavailable, there are at least d nodes that 
are still available in the candidate set. Then the newcomer 
arbitrarily contacts d available nodes in the candidate helper 
set. Mathematically, such a scheme can be interpreted as an 
SHS scheme in the following way. Denote D{Fr) as the set of 
(d+r) candidate helpers of the failed node TV at time r. Again 
let Ur denote the collection of temporarily unavailable nodes. 
Without loss of generality, we assume that Ur C D{Fr) and 
\Ur\ = r. Namely, there are exactly r unavailable nodes and 
all of them are within the candidate set D{Fr). This is possible 
since any scheme has to consider the worst cas^l scenario, in 
which all r unavailable nodes are within D{Fr). Then, we 
simply set the SHS function D{Fr,Ur) by 

D{Fr,Ur) = D{Fr)\Ur. (2) 

Since D{Fr) has (d -f r) nodes and |C7 t-| = r, the function 
D{Fr, Ur) indeed returns d helpers for the given (Fr, Ur)- 

more rigorous formulation has to take an adversarial approach. Namely, 
if Ur is not completely inside D{Fr) or \Ut\ < r, then we simply let an 
adversary to choose a new U(. satisfying {Ur n D{Fr)) U U^ C D{Fr) 
and \Ur \ = r. 


BHS: The last type of helper selection, which is the most 
basic, is blind helper selection (BHS) that allows the newcomer 
to access any arbitrarily selected d nodes of the surviving 
nodes. This scheme was initially assumed for RCs in ijTl. 

C. Helper Selection Schemes In Existing Works 

The helper selection schemes of all existing LRC construc¬ 
tions are SHS. Specihcally, for r — 0 (i.e., nodes are always 
available), LRCs ||9l, E3, ESl use SHS where each node is 
assigned a fixed set of d helper nodes. For r > 0, LRCs ED, 
ED assign each node a fixed set of {d -I- r) helper nodes and 
during repair the newcomer can arbitrarily connect to any d 
nodes of the (d -I- r) nodes in its helper set. As explained 
in Section IlI-BI such a scheme is a special form of SHS. 
Almost all LRCs considered in the existing literature use the 
above method to handle temporary node unavailability. To our 
knowledge, the only example in the literature that does not 
use the above helper selection method is in E3]| . for which the 
helper selection is based on a carefully designed D{Fr,Ur) 
instead of ^- 

III. Information Flow Graphs and the 
Corresponding Graph-Based Analysis 

Before introducing our main results, we quickly explain 
the concepts of information flow graphs (IFGs) and the cor¬ 
responding analysis, which was first introduced in Q- For 
readers who are not familiar with IFGs, we provide its detailed 
description in Appendix 0 

Intuitively, each IFG reflects one unique history of the 
failure patterns and the helper selection choices from time 1 
to (r — 1) 121. Consider any given helper selection scheme 
A which can be either DHS or SHS. Since there are in¬ 
finitely many different failure patterns Fr and infinitely many 
different unavailable node sets Ur (since we consider r = 
1 to oo), there are infinitely many IFGs corresponding to 
the same given helper selection scheme A since the IFG 
grows according to the helper choices Z3r({Fi}I=i, {Uj}j^i) 
for DHS or D{Fr,Ur) for SHS. We denote the collection 
of all possible IFGs of a given helper selection scheme 
A by 0A{n,k,d,r,a, 13)- We define Q{n,k,d,r,a, P) = 
[J\fA^A{n,k,d,r,a, (3) as the union over all possible helper 
selection schemes A. We sometimes drop the input argument 
and use Qa and Q as shorthands. The collection Q can also 
be viewed as the IFGs generated by BHS. The reason is 
that BHS blindly selects the helpers and thus will take into 
consideration all possible ways of growing the IFG. As a 
result, C/bhs = G = UvA GA{n, k, d, r, a, (3)- 

Given an IFG G G (/, we use DC(G) to denote the 
collection of all data collector nodes in G 0. Each data 
collector t G DC(G) represents one unique way of choosing 
k out of n active nodes when reconstructing the file. Given an 
instance of the IFGs G € Q and a data collector t G DC(G), 
we use mincut( 3 {s,f) to denote the minimum cut value 12211 
separating s, the root node (source node) of G, and t- 

For any helper scheme A and given system parameters 
{n, k, d,r, a, /3), the results in El prove that the following 
condition is necessary for the existence of any distributed 
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storage network with helper selection scheme A that can meet 
the design requirement (n, k, d, r, a, /3): 

min min mincutcfs, i) > ■ (3) 

GeeAieDC(G) 

If we limit our focus to a distributed storage network with 
BHS, then the above necessary condition becomes 

min min mincutcfs, f) > (4) 

GGeteDC(G) 

Reference f?! later found a closed-form expression of the left- 
hand side (LHS) of (lU 

fc-i 

min min mincutcfs, f) = > min((fi — i)"*"/?, a), (5) 

GeetGDC(G) 

t—0 

where (a:)’*' = max(a;,0)j which allows us to numerically 
check whether ^>M for any (n, k, d, r, a, 0) values. Specif¬ 
ically, the necessary condition (IDl becomes 

fc-i 

min(((i — a) > M.. (6) 

i=0 

Reference 12^ further proves that when considering a 
fixed but sufficiently large hnite held GF(g), (|6ll is not only 
necessary but also sufficient for the existence of a BHS-based 
distributed storage network that meets the design requirement 
{n,k,d,r,a,P). 

Fix the values of {n, k, d, r), two points on a storage- 
bandwidth tradeoff curve of any given helper selection scheme 
A are of special interest; the minimum bandwidth regenerating 
(MBR) and minimum storage regenerating (MSR) points. 
These points can be dehned as follows: 

Definition 1: For any given (n, k, d, r) values, the MBR 
point (aMBR, /3mbr) of a helper scheme A is dehned by 

/3mbr = min{/3 : (a,fi) satishes © and a = oo} (7) 

cqviBR = min{a : (q;,/ 3) satishes © and /3 = /3mbr}- (8) 

Definition 2: For any given {n,k,d,r) values, the MSR 
point (gmsr,/^msr) of a helper scheme A is dehned by 

ccmsr = min{Q; : (a,/3) satishes © and /3 = oo} (9) 

/i^MSR = min{/3 : (a,/3) satishes © and a = ansR}- 

Specihcally, the MBR and MSR points are the two extreme 
en dfl of the bandwidth-storage tradeoff curve in Q. 

The above graph-based analysis also allows us to define the 
optimality of different helper selection schemes. 

■^An alternative definition of the MSR point is when each node only stores 
ctf^SR “ packets. The difference between these two definitions is as 
follows. The Qmsr in is the smallest possible storage under a given helper 
selection scheme A and given reliability requirement (n, k, d, r). In contrast, 
the alternative definition ^ is the smallest possible storage that 

can be achieved by an (n, k) erasure code, which always requests repair data 
from k helpers. For example, when (n, fc, d, r) = (5, 3, 2,1), one can prove 
that regardless how one design the helper selection scheme A, we always 
have «msr > Namely, the smallest achievable storage omsr is lower 
bounded by Since no scheme can possibly achieve the 

alternative MSR definition will say that the MSR point is not achievable for 
the parameter (n, k, d, r) = (5, 3, 2,1) 


Definition 3: For any given {n,k,d,r) values, a helper 
selection scheme A is optimal, if for any DHS scheme B 
the following is true 

min min mincutcfs,f) > min min mincutcfsif) 
GGeAtGDC(G) GgPb teDC(G) 

for all (a, /3) combinations. That is, scheme A has the best 
(a, /3) tradeoff curve among all DHS schemes and thus allows 
for the protection of the largest possible hie size. 

IV. The Main Results 

The main contributions of this work are the answers to 
the following two questions. Question 1: When designing an 
optimal helper selection scheme, is it sufficient to limit the 
search scope to only considering SHS schemes? Question 2: 
We observe that for some {n,k,d,r) values, even the best 
DHS/SHS schemes do not do better than the simplest BHS 
scheme. We call such (n, k, d, r) values being indfferent to 
helper selection since for those (n, k, d, r) the BHS is as good 
as any schemes. The question to be answered is thus for any 
arbitrarily given (n, k, d, r), is there any way to quickly check 
whether it is indifferent to helper selection or not? 

We answer the hrst question in the following Propositions [T] 
and |2] and answer partially the second question in Proposi¬ 
tions |3] to |6] 

Proposition 1: For (n, k, d, r) = (5, 3, 2,1) and (5,4, 2,1), 
and any arbitrary (a, 0) values, there exists no SHS scheme 
that can protect a hie of size larger than that of BHS. 

Proposition 2: For (n, k, d, r) = (5, 3, 2,1) and (5,4, 2,1), 
there exists a pair of (a, /3) values such that one can hnd 
a DHS scheme that can protect a hie of size strictly larger 
than that of BHS. Furthermore, we explicitly devise a DHS 
scheme that is provably optimal for {n,k,d,r) = (5,3, 2,1) 
and (5,4, 2,1). See Dehnition[2 

We can see, by Proposition |2] that using DHS we can protect 
a hie size strictly larger than that of the best SHS scheme. This 
answers Question 1 by showing that SHS is not enough to 
achieve the optimal performance for {n,k,d,r) = (5, 3, 2,1) 
and (5,4, 2,1). At least for these two parameter values, a DHS 
scheme is necessary. A byproduct of our optimal DHS scheme 
is that it achieves the MBR and MSR points simultaneously. 
Specihcally, it simultaneously minimizes the bandwidth and 
storage for (n, k, d, r) = (5,3, 2,1) and (5,4, 2,1). 

The following proposition answers the second question by 
providing conditions that can be used to check whether a given 
(n, k, d, r) value is indifferent to helper selection or not. 

Proposition 3: If the following inequality 


n — d — r 

holds, then for any arbitrary {a, 0) values there exists no DHS 
scheme that can protect a hie of size larger than that of BHS. 

simple analogy is as follows. It is known that for binary symmetric 
channels linear codes are capacity-achieving. Namely there is no need to 
search for non-linear codes. When considering network coding, again linear 
codes are capacity-achieving for the single multicast setting. But the seminal 
results in 0 prove that linear codes are not sufficient for the multiple unicast 
setting. For this work, we would like to answer the question whether SHS is 
sufficient (capacity-achieving) for all (n,k,d,r) values? 
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If the following inequality 


min(d + 1 , fc) > 


n 

n — d — r 


( 11 ) 


holds, then there exists an SHS scheme and a pair of (a, /3) 
such that we can protect a file of size strictly larger than that 
of BHS. 

There are some {n, k, d, r) values that satisfy neither (fTol i 
nor (fTTI) for which it remains open whether those {n,k,d,r) 
are indifferent to helper selection or not. Therefore the char¬ 
acterization in Proposition |3] is not tight. 

For the case^ of r < 1, we can further sharpen the results 
as follows. 

Proposition 4: E] Proposition 1] For any (n, k, d, r) values 
satisfying r = 0, if either (fTOl i or 


d = 1, fc = 3, and n is odd (12) 


holds, then for any (a, /3) values, there exists no DHS scheme 
that can protect a file of size larger than that of BHS. If neither 
(doll nor (fT2l l holds, then there exists an SHS scheme and a 
pair of {a, /3) such that we can protect a file of size strictly 
larger than that of BHS. 

Proposition 5: For any {n,k,d,r) values satisfying r = 
1 , d = 1, if either (i) (fTOl l holds or (ii) k = 3 or (iii) 

fc = 4, and n mod 3 7 ^ 0 (13) 

holds, then for any {a, /3) values, there exists no DHS scheme 
that can protect a file of size larger than that of BHS. If none 
of (i)-(iii) holds, then there exists an SHS scheme and a pair 
of (a, /3) such that we can protect a file of size strictly larger 
than that of BHS. 

Proposition 6: For any (n, k, d, r) value satisfying r = 
1 , d = 2 , if (fTol) does not hold then there exists a DHS scheme 
and a pair of (a, /3) such that we can protect a file of size 
strictly larger than that of BHS. 

Propositions |4] to | 6 ] close the gap in Proposition [3 and 
provide tight characterization for the cases of “r = 0 ” and 
“r = l,d < 2.” Propositions [ 3 ] to |6] quickly leads to the 
following corollary. 

Corollary 1: For any {n,k,d,r) satisfying r < 1, d < 5, 
and 


{n, k, d, r) i {(7, 3, 3,1), (9, 3,4,1), (7,4,4,1), (11, 3, 5,1)}, 

(14) 

we can easily determine whether {n,k,d,r) is indifferent 
to helper selection or not by checking some very simple 
conditions. 

The proofs of Propositions [T] and |2] will be presented in 
Section |V] The proofs of the converse and the achievability 
parts of Proposition [3 will be provided in Sections FVll 

Proposition |4] focuses on the special case of r = 0 and is 
a restatement of the results in 13 Proposition 1]. The proof 
of Proposition |3 is relegated to Appendix |B] We close this 
section by providing the proof of Proposition |3 which reveals 
a connection between Proposition | 6 ] and Propositions [T] and |3 


^Arguably, the cases of small r are more interesting from a practical 
perspective. 


Proof of Proposition | 6 } By simple counting arguments 
provided in Appendix O one can show that among all 
{n,k,d,r) satisfying r = 1 and d = 2 , there are only two 
instances {n,k,d,r) = (5, 3, 2,1) and (5,4, 2,1) that satisfy 
neither (fTOl i nor Namely, any other (n, k, d, r) satisfies 
at least one of (fTOl i and (fTTT i. By Proposition [3 we only need 
to decide whether a given (n, k, d, r) is indifferent to helper 
selection for these two instances. 

At the same time, by Propositions [T] and |3 there exists a 
DHS scheme and a pair of (a, /3) such that we can protect a 
file of size strictly larger than that of BHS. The proof is thus 
complete. ■ 

V. Stationary Helper Selection Is Insueficient 

The proofs of Propositions [T] and |3 are provided in Sec¬ 
tions IV-AI and IV-BI respectively. Jointly, they prove that SHS 
is insufficient in terms of achieving the optimal repair BW. A 
byproduct of the results in Propositions [T] and |3 is a simple 
proof showing that functional repair can be strictly better than 
exact repair, which is provided in Section IV-DI 


A. Proof of Proposition Q] 

1.2 
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Eig. 1. Storage-bandwidth tradeoff curves of LRRCs with DHS, LRRCs 
with SHS, and RCs with BHS for (n, fc, d, r) = (5, 3, 2,1). 

We first consider the case of (n, k, d, r) = (5,3, 2,1). Since 
BHS is used, the newcomer can access any d = 2 out of 
3 = n — r — 1 available nodes and it thus naturally handles 
node unavailability (r = 1). The storage-BW tradeoff when 
BHS is used can then be derived directly from plugging in 
(n, fc, d) = (5,3, 2) in (|6]l- Namely, as long as the BHS policy 
is used, the storage-BW tradeoff curve must satisfy 

min(2/3, a)-b min(/3, a) > Ad, (15) 

where Ad is the file size. A normalized storage-BW tradeoff 
curve of (fTsT l. is plotted in Fig. [T] Namely, if each node stores 
only half of the overall file ^ = 0.5, then the normalized 
repair BW is ^ = 1. However, if we are willing to use a larger 
normalized storage size ^ = | rather than i, we can reduce 


- LRRCs with DHS 

- LRRCs with SHS 

. RCs with BHS 

\ 
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the normalized BW from 1 to |. Note that when BHS 
is used, we are essentially analyzing the original RCs in ||7l 
with (n, k, d) = (5,3, 2). Therefore, the actual protection level 
satisfies k* = d = 2 which is strictly smaller than the target 
protection level fc = 3. This means that any code construction 
with BHS is overprotecting the data. 

In the following, we will show that even if one is allowed 
to choose the helpers in an intelligent way (other than BHS), 
we are still not able to improve the storage-BW tradeoff curve 
in (fTSl) if we are restricted to using only SHS One implication 
of this result is that when {n,k,d,r) = (5,3,2,1) any 
existing/future LRC scheme will have the same performance 
as the original RC if restricted to using only SHS schemes. 

Consider any SHS scheme A with the corresponding sta¬ 
tionary helper selection function being D{F,U) where F 
is the failed node and U = {j} is the set containing the 
temporarily unavailable node j since we now focus on r = 1. 
For simplicity, we sometimes just say node U is unavailable 
when it is clear from the context that \U\ = r = 1. Recall that 
Qa is the collection of IFGs that are grown according to the 
helper selection scheme A. The main idea is to prove that no 
matter how we design the D{F, U), we are bound to have a 
graph G* £ Qa for which mintgDC(G*) mincutc* (s, t) equals 
to the LHS of (flsT l. This implies that 

min min mincutfifs, f) < min(2/3, a)-f min(/3, a). 

G&Ga teDC(G) 

( 16 ) 

By the fact that Ga C Q, and by dD, we thus have 


min min mincutfjis, f) < min min mincutfifs, f) 
GeeteDC(G) “GeeAtGDC(G) 

< min(2/3, a) + min(,9, a) 


= min min mincutcfs, f)- 
GeetGDC(G) 


(17) 


As a result all the inequalities in (fTTl i must be equality. The 
storage-BW tradeoff curve of helper scheme A in Q is thus 
identical to the RC tradeoff in (fT?t . 

To that end, we will prove that, regardless how we design 
the helper selection D{F, U) function, we can always find a 
graph G* £ Qa such that there exist 3 active nodes x, y, 
and z satisfying (i) each node has been repaired at least once, 
and (ii) cc is a helper when repairing y, and (iii) both x and 
y are the helpers when repairing z. Considering the cut in 
G* that directly separates the source (root) from {x, y, z}, 
we can observe that node x will contribute min(2/3,a) to 
the cut value; node y will contribute min(/3, a) to the cut 
value since node x was the helper of node y\ and node 
z will contribute 0 to the cut value since both x and y 
are the helpers of z. Therefore, the cut that separates the 
source (root) directly from {cc, y, z} will have the cut-value 
being min(2/3, a) -f min(/3, a). As a result, the min-cut value 
mincutc* (s, {x, y, z}) is no larger than the LHS of (fTSl l. We 
have thus proved (fTSl l. 

We prove the existence of such an IFG G* £ Qa by contra¬ 
diction. Without loss of generality, suppose that D{1, {4}) = 
{2, 3} in the SHS scheme A. Namely, if node 1 fails and node 
4 is not available, then the newcomer (node 1) will access 


nodes 2 and 3 as helpers. This assumption can always be 
made true by relabeling the nodes. We consider the following 
2 cases. 

Case 1: 77(2, {1}) ^ {4,5}, or 77(2, {4}) ^ {1,5}. 
Consider the following three subcases. Case 1.1: 77(2, {!}) ^ 
{4,5}. Since 77(2,{1}), by definition, returns a subset of 
{1, 2,3,4, 5}\({F} U 77) = {3,4,5}, we must have either 
77(2, {1}) = {3,4} or {3,5}. We now fail node 3 first and 
repair it following scheme A. (What is the rule that scheme A 
uses to repair node 3 is irrelevant in our proof.) Then, fail node 
2 and suppose node 1 is unavailable. Since 77(2, {1}) = {3,4} 
or {3, 5} in Case 1.1, node 3 will definitely be a helper of node 
2. Next, fail node 1 and assume node 4 is unavailable. Since 
77(1, {4}) = {2,3}, node 1 will access nodes 2 and 3 for 
repair. We can observe that we have constructed such a G* 
where nodes (x,y,z) = (3,2,1) satisfy properties (i) to (iii) 
in the previous paragraph. The proof for Case 1.1 is complete. 

Case 1.2: 77(2, {4}) ^ {1,5}. Since 77(2, {4}) returns a 
subset of {1,3,5}, we must either have 77(2, {4}) = {1,3} 
or {3,5}. Similar to Case 1.1, we fail node 3 first. Then, 
we fail node 2 and assume that node 4 is unavailable. Since 
77(2, {4}) = {1, 3} or {3, 5}, node 3 must be a helper of node 
2. Finally, fail node 1 and assume that node 4 is unavailable. 
Since 77(1, {4}) = {2,3}, node 1 will access nodes 2 and 3 
for repair. In the end, nodes {x, y, z) = (3, 2,1) satisfy (i) to 
(iii). 

Case 2: 77(2, {1}) = {4,5} and 77(2, {4}) = {1,5}. 
We consider two subcases. Case 2.1: 77(1, {3}) ^ {4,5}. 
Therefore, we must have 77(1, {3}) = {2,4} or {2,5}. For 
ease of exposition, we say 77(1, {3}) = {2,u} where v 
is either node 4 or node 5. We now fail node v first and 
repair it under scheme A. (What is the rule that scheme A 
uses to repair node v is irrelevant in our proof.) We then 
fail node 2 and assume that node 1 is unavailable. Since 
77(2, {1}) = {4,5}, nodes 4 and 5 are the helpers of node 
2. Then, fail node 1 and assume node 3 is unavailable. Since 
77(1, {3}) = {2,v}, nodes 2 and v are the helpers of node 
1. Observe that {x,y,z) = (u,2,1) satisfy (i) to (iii) and we 
have thus constructed such a G*. 

Case 2.2: 77(1, {3}) = {4, 5}. We fail node 5 first and repair 
it under scheme A. (What is the rule that scheme A uses to 
repair node v is irrelevant in our proof.) We then fail node 1 
while assuming that node 3 is unavailable. Since 77(1, {3}) = 
{4,5}, nodes 4 and 5 are the helpers of node 1. Then, fail 
node 2 and assume that node 4 is unavailable. Since in Case 2 
we have 77(2, {4}) = {1,5}, nodes 5 and 1 are the helpers 
of node 2. Observe that {x, y, z) = (5,1, 2) satisfy (i) to (iii) 
and we have thus constructed such a G*. 

Thus far, we have completed the proof for the case of 
{n,k,d,r) = (5, 3, 2,1). We now discuss how to prove the 
case of {n,k,d,r) = (5,4,2,1). By (|6]l, one can directly 
prove that when BHS is used, the storage-BW tradeoff curve 
of (n, k, d, r) = (5,4, 2,1) is also governed by (flSl l. 

To prove that the storage-BW tradeoff curve of SHS is 
also Ell, we will prove that regardless how we choose the 
helper selection function D{F, U), there always exists a graph 
G** £ Qa such that there exists 4 active nodes x, y, z, and w 
satisfying (i) each node has been repaired at least once, and 
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(ii) a; is a helper when repairing y, (iii) both x and y are the 
helpers when repairing z, and (iv) the helper nodes of w are 
a subset of {x, y, z}. 

By the discussion in the previous proof of {n, k, d, r) = 
(5, 3, 2,1), we can always find a G* such that there exist three 
active nodes [x, y, z) satisfying (i) to (iii). Without loss of gen¬ 
erality, assume the three active nodes are {x,y,z) = (1,2,3). 
Then we fail node 4 and assume node 5 is unavailable. Since 
there are only 3 remaining nodes {1, 2, 3}, regardless how we 
choose 0(4, {5}), the helpers of node 4 must be a subset of 
{1, 2, 3}. We call the IFG after repairing node 4, G**. Choose 
w = 4. Then nodes (x, y, z, w) = (1, 2, 3,4) satisfy (i) to (iv). 

By similar arguments, one can easily check that the min- 
cut separating the root of G** and the four nodes (x, y, z, w) 
is at most min(2/3, a) + min(/3, a). Therefore, the storage- 
BW tradeoff curve for any SHS scheme with {n,k,d,r) = 
(5,4, 2,1) must again be (flsT l. The proof of Proposition [T| is 
thus complete. 

B. Proof of Proposition |2] 

We first consider {n,k,d,r) = (5, 3, 2,1) and prove that 
there exists a DHS scheme that has the following new strictly 
better tradeoff curve; 

2mm(2/3,a) > M, (18) 

i.e., it strictly outperforms the best possible SHS scheme, for 
which the tradeoff is governed by (flSl l. Our proof is by explicit 
code construction with /3 = 1, a = 2, and Ai = 4, which 
achieves the corner point of (fTsT l. also see Fig. [T| The scheme 
consists of two parts. Part I: How to choose the helper nodes 
for a newcomer? Part II: What is the coded data sent by each 
helpefl after the helpers are decided? 

To describe Part I, we need the following notation. We say 
node i is the parent of node j if (i) node i was the helper of 
node j, and (ii) node i has not been repaired since the failure 
of node j. For example, say node 1 fails and accesses nodes 
2 and 3 as helpers. Then node 2 fails and accesses nodes 3 
and 4. After the above two repairs, node 3 is a parent of node 
1 but node 2 is not since node 2 has been repaired. On the 
other hand, both nodes 3 and 4 are parents of node 2. 

The main idea of the proposed DHS scheme is to choose 
helpers such that no 3 nodes ever form a “triangle”. Namely, 
we carefully choose the helpers of the newcomers so that we 
can avoid the existence of 3 nodes {x, y, z} such that x is the 
parent of both y and z; and y is the parent of z. We term this 
DHS scheme the Clique-Avoiding (CA) scheme. 

We now prove by induction that CA is always possible. In 
the beginning, all nodes are intact and no node is the parent of 
another. Therefore, there does not exist any 3 nodes forming 
a triangle. Suppose there is no triangle after (tq — 1) repairs. 
At time r = tq, suppose a node fails. For future reference, 
denote that node as node z. Since the network had no triangle 
at time (tq — 1), we only need to ensure that the newcomer z 
does not participate in any triangle after the repair. Denote the 
helper choice of the CA scheme for time r = tq by {x,y}. 

^ Since a = 2 = d/3, each node simply stores all the d/3 packets it has 
received in its local memory. 


Therefore, we only need to carefully choose the helper set 
{x,y} such that neither nodes {x,y,z} nor nodes {y^x,z} 
form a triangle after repair. 

To prove the existence of such {x, y}, we observe that out 
of the n — 1 = 4 surviving nodes, at most r — 1 node is 
unavailable. As a result, the newcomer c has 3 nodes to choose 
d = 2 helpers from. Say, the nodes to choose from are {i, j, kf 
and without loss of generality assume node i is the oldest 
(being repaired the earliest) and node k is the youngest (being 
repaired the latest) of the three. We argue that among the 
three pairs (j, k), {i, fc)} one of them must not form a 

parent-child pair. If not, i.e., all three are parent-child pairs, 
then nodes {i,j,k} form a triangle in time (tq — 1 ), which 
leads to contradiction. Say node i is not a parent of k. Then, we 
choose nodes i and k to be the helper set {x, y}. As a result, 
neither nodes {x,y,z} nor nodes {y,x,z} form a trianglfl 
By induction CA is always possible. 

Note that the CA scheme needs to use the repair history 
to decide which of the three pairs (j, k), {i, fc)} is not 

a parent-child pair and then chooses that pair as the helpers. 
Therefore, the choice of the helper sets may vary from time 
to time. This is a significant departure from the principle of 
associating each node x with a fixed helper set. Because the 
CA scheme has to dynamically select the helpers based on 
repair history, we can see that the CA scheme is indeed a 
DHS scheme. 

We now describe Part II: What is the coded data sent by 
each helper? Our construction uses only the binary field rather 
than high-order GF(( 7 ). A concrete example will be given after 
we give a complete description of our coding scheme. Initial¬ 
ization: Recall that a = 2, = 1, and M. = 4. Consider 

a file of 4 packets Xi to X^. Initially, we let nodes 1 and 2 
store {Xi,X 2 } and {X 3 , X 4 }, respectively. We then let nodes 
3 and 4 store packets {XijX^} and {X 2 ,X 4 }, respectively. 
Finally, let node 5 store coded packets {[Xi-\-X 2 ]: [X 3 -I-X 4 ]}. 
The initialization phase is now complete. See Fig. |2] for the 
illustration after the initialization phase. 


Node 1 Node 2 Node 3 Node 4 Node 5 



Fig. 2. The Code of the CA Scheme After Initialization. 

For easier description of our code construction, right after 
initialization, we artificially define nodes 1 and 2 as the parents 
of node 3 even though nodes 1 and 2 are not helpers of node 
3. The reason is that the packets in node 3 are {Xi,X 3 } and 
they can be viewed as if node 3 has failed and got repaired 
from nodes 1 and 2. See Fig. for illustration. Similarly, we 
artificially define nodes 1 and 2 as the parents of node 4 (resp. 
node 5) even though nodes 1 and 2 are not helpers of node 4 

* Recall that node i is older than node k so node k can never be a parent 
of node i. 
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(resp. node 5)0 

The regular repair operations: Suppose node a fails and 
one other node is temporarily unavailable at time r. We run 
the aforementioned CA helper selection scheme to hnd the 
helpers b and c for node a. Denote the two non-helper nodes 
by d and e. Each of b and c will send 1 packet to a since 
/3 = 1. The packets are constructed as follows. Step 1: Denote 
the two (potentially coded) packets stored in b by and 
Among the three candidate packets , and the 

binary sum node b will choose one packet, call 

it Z^, and send it to a. 

Before describing how to choose Z^, we construct two 
conditions based on the packets currently stored in nodes c, d, 
and e. If nodes c and d jointly contain 4 linearly independent 
packets, then we construct Condition 1 to be cannot be 
expressed as a linear combination of the two packets stored 
in d.” Otherwise, we construct Condition 1 to be cannot 
be expressed as a linear combination of the packets stored 
in c and d.” Namely, depending on the coded packets stored 
in nodes c and d. Condition 1 can be one of the above two 
different statements. Similarly, if nodes c and e jointly contain 
4 linearly independent packets, then we construct Condition 2 
to be “F^ cannot be expressed as a linear combination of the 
two packets stored in e.” Otherwise, we construct Condition 2 
to be “F^ cannot be expressed as a linear combination of the 
packets stored in c and e.” 

After constructing the two conditions, we require the choice 
Fj to satisfy simultaneously both Conditions 1 and 2. If there 
is more than one choice of F^ satisfying both conditions, then 
an arbitrary one of those ZT will do. 

fc) 

Step 2: Denote the two packets stored in c by F^^^ and 
Y 2 ‘^\ Among three candidate packets Yj'^\ Y 2 '^\ and the 
binary sum node c will choose one packet, call 

it F*, and send it to node a. We require the packet F* to 
satisfy simultaneously: (i) F* cannot be expressed as a linear 
combination of F^ and the two packets stored in d; and (ii) 
F* cannot be expressed as a linear combination of F^ and the 
two packets stored in e. 

Once node a receives F^ and F*, it stores both packets in 
its local memory (since a = 2). 

Lemma 1 (feasibility of the proposed scheme): We can al¬ 
ways find the F^ and F* satisfying the specified conditions. 
As a result the code can be iteratively constructed for all time 
T = 1 to OO. 

The proof of Lemma [T] is relegated to Appendix |D] 

Proposition 7: Using the above construction (Parts I and 
II), for any time t, any fc = 3 nodes can always reconstruct 
the original n = 4 packets X\ to X/^. Such a binary code 
construction = (2,1,4) thus satishes the reliability 

requirement {n,k,d,r) = (5, 3, 2,1). 

The proof of Proposition |7] is relegated to Appendix |D] 

Let us use an example to illustrate our construction. Suppose 
after initialization, node 3 fails and node 2 is unavailable. 
Newcomer 3 thus has to access two of the nodes {1,4,5} 

^Even with the artificially defined parent-child relationship, there is no 
triangle after initialization. We can thus use the same induction proof to show 
that CA is always possible after initialization. 


for repair. Since node 1 is the parent of both nodes 4 and 
5, the CA scheme will avoid choosing {1,4} and {1,5} and 
select helpers {4, 5} instead. Specifically, a = 3, 6 = 4, and 
c = 5; and d = 1 and e = 2. 

Since node 6 = 4 stores {X 2 , A' 4 }, see Fig. |2] the three 
candidates for F^ are X 2 , X 4 , and [X 2 + X 4 ]. 

Since node c = 5 stores {[Xi + ^ 2 ], [ATa + ^^ 4 ]} and node 
d = 1 stores {A'i,X 2 }, these two thus contain 4 linearly 
independent packets. Condition 1 becomes “F^ cannot be any 
linear expression of packets Xi and X2, the packets in node 
d.” Similarly, since e = 2 stores {X 3 ,X 4 } and jointly nodes 
c and e contain 4 linearly independent packets. Condition 2 
becomes “F^ cannot be any linear expression of packets X 3 
and X4, the packets in node e.” Out of the three candidates 
X2, X4, and [X2 -I--A 4 ], only the coded packet [X2 -I--A 4 ] can 
satisfy both conditions simultaneously. Therefore we choose 
F* = [X2 + X4]. 

Since node c = 5 stores {[ATi + X2], [A'a + X4]}, the three 
candidates for F* are [Xi -j- X 2 ], [Xa + X4], and [Xi -f X2 -f 
A'a-I-X 4 ]. The choice of F* thus has to satisfy simultaneously 
(i) F* is not a linear combination of F^ = [X 2 -f X4] and the 
two packets Xi and X 2 in node d; and (ii) F* is not a linear 
combination of F^ = [X2 -f X4] and the two packets X^ and 
X4 in node e. Out of the three candidates [X1+X2], [X3+X4], 
and [Xi -f X2 -f X2, + X4], only the coded packet [Xi + 
X2 + X^ + X4] can satisfy both conditions simultaneously. 
Therefore, we choose F* = [Xi + X2 + X3 + X4]. 

In the end, F^ = [X 2 + X 4 ] will then be sent to node a 
from node 6 and F* = [Xi + X 2 + X^ -f X 4 ] will be sent to 
node a from node c. Newcomer a will then store both packets 
in its storage. The same repair process can then be repeated 
and applied to any arbitrary next newcomer. 

The above CA helper selection scheme (Part I) and its 
code construction (Part II) thus achieve the new storage-BW 
tradeoff in (fT^ for the case of (n, fc, d, r) = (5, 3, 2,1). 

For the case of (n, fc, d, r) = (5,4, 2,1) we notice that we 
can use the same coda'i that is constructed for (n, k, d, r) = 
(5, 3, 2,1) to achieve the same storage-BW tradeoff in (fTsT l. In 
the end of Section IV-AI we have already proven that the best 
SHS storage-BW tradeoff curve of (n, fc,d, r) = (5,4, 2,1) 
is governed by O. As a result Proposition |2] is proven for 
(n, k, d, r) = (5, 3, 2,1) and (5,4, 2,1). 

C. The Optimal Solution For (n,k,d^r) = (5,3,2,1) and 
(5,4, 2,1) 

In this subsection, we prove that there exists no scheme 
(DHS or SHS) that can outperform the storage-BW tradeoff 
curve in (fTsT l. Therefore, the scheme described in Section IV-BI 
is optimal. 

Propositions.- Suppose {n,k,d,r) — (5,3,2,1) or 

(5,4, 2,1) and consider any arbitrary DHS scheme A. We have 
that 

min min mincut( 7 (s,f) < 2min(2/3,a). (19) 

GeGA teDC(G) J - V y 

**’ln this way, we are overprotecting the data since the actual k* = 3 but 
the target /c = 4. 
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Observe that, by (fTsT l. the proposed CA scheme and the 
corresponding code construction achieve the upper bound in 
Proposition [8] above. Therefore, we have that the proposed 
scheme is indeed optimal and there exists no helper selection 
scheme that can outperform it. 

Proof of Proposition^ We first prove this proposition for 
(n, k, d, r) = (5, 3, 2,1). Consider an IFG G* G Qa such that 
all its nodes have been repaired before. Consider the newest 
node in G* that we denote by z. Observe that z must be 
connected to two older active nodes, call them x and y. Now, 
consider a data collector that is connected to {x,y,z}. We 
can see that node x will contribute min(2/3,a) to the value 
of the cut that directly separates the root from the three nodes 
{x, y, zj. Moreover, node y will contribute at most min(2/?, a) 
to the value of the cut that directly separates {cc, y, z}. On 
the other hand, node z cannot contribute anything to the cut- 
value since it is connected to both x and y. Therefore, the 
value of the cut that directly separates the root and {x,y,z} 
is at most 2min(2/3,a). As a result, the minimum cut-value 
mincutc* (s, f) for that particular t is upper bounded by 
2min(2/3,a). Taking the minimum of all possible t and all 
possible G G Ga, we get the inequality in ( [19] ) and the proof 
is complete. 

For the case of {n,k,d,r) = (5,4, 2,1), consider an IFG 
G* G Ga such that all its nodes have been repaired before. 
Consider the newest node in G* that we denote by z. Observe 
that z must be connected to two older active nodes, call them x 
and y. Denote the nodes other than nodes x, y, and z, by nodes 
w and u. We fail node w and make u temporarily unavailable. 
Repair node w according to the given scheme A, which must 
access 2 out of the three remaining nodes {x, y, z}. 

Now, consider a data collector that is connected to 
{x,y,z,w}. We can see that node x will contribute 
min(2/3,a) to the value of the cut that directly separates 
the root from {x, y, z, w} and node y will contribute at most 
min(2/3,a) to the value of that cut. Nodes z and w will not 
contribute any amount to the cut value since z is connected to 
{x, y} and w is connected to two of {x, y, zj. By the verbatim 
argument as in the case of (n, k, d, r) = (5,3, 2,1), we have 
thus proven (IT^ for the case of (n, k, d, r) = (5,4, 2,1). ■ 

D. A Byproduct of Propositions [7] and \2\ 

As we saw in Table ID we have two repair modes in 
distributed storage codes: functional repair and exact repair. 
Recall that in functional repair, nodes are allowed to store 
any functions of the original data, i.e., nodes do not have to 
retain the same packets at all times. In exact repair, however, 
nodes are required to store the same packets at all times. 
RCs Q were originally proposed with functional repair since 
functional repair is more general and could potentially lead 
to more repair-BW reduction. Exact repair was subsequently 
considered as it was observed that it can decrease overhead 
compared to functional repair due to the fact that the decoding 
and repairing rules are fixed in exact repair as opposed to the 
changing rules in functional repair. Moreover, it is possible 
using an exact repair code to have the original data be the 
systematic packets of the code which greatly facilitates data 


retrieval and reconstruction. Exact repair codes that achieve 
the MSR point of RCs (with BHS) were given in fT^, 
|[T^ and for the MBR point in ifTbl . fTsl. In IT^ . it was shown 
that the majority of the interior points on the tradeoff curve 
of RCs cannot be achieved by exact repair. The exact repair 
rate region of the simple case of (n, k, d, r) = (4,3, 3,0) was 
characterized in ED and it was shown that indeed there is a 
gap between the optimal tradeoff of functional repair and exact 
repair. Specihcally, functional repair is strictly more powerful 
than exact repair and its benefits should not be overlooked. 

The fundamental hnding in lITTIl is proven by a computer- 
aided proof. It turns out that if we focus on a different 
(n, k, d, r) value other than (4, 3, 3, 0), we can easily prove the 
same statement “functional repair strictly outperforms exact 
repair” without resorting to the relatively-involved computer- 
aided-proof approach. 

Proposition 9: Eor (n, k, d, r) = (5,3, 2,1) and (5,4, 2,1), 
there exists at least one pair of (a, /3) values such that LRRCs 
with functional repair can protect a hie of size strictly larger 
than that of LRRCs constrained to exact repair. Eurthermore, 
for these two {n,k,d,r) values, the superiority of functional 
repair over exact repair occurs in both the MSR and MBR 
points, unlike the case of {n,k,d,r) = (4, 3,3, 0) where the 
superiority occurs only in the interior points. 

Proof of Proposition [9] The proof is by contradiction. 
Consider (n, fc, d, r) = (5,3, 2,1) or (5,4, 2,1). The following 
arguments work for both cases. By Propositions [T] and |2| we 
know that the tradeoff of the best SHS scheme is the same as 
that of BHS and that DHS strictly outperforms BHS for at least 
one pair of (a, jd) values. Suppose now that there exist exact 
repair LRRCs with DHS that can achieve the entire optimal 
tradeoff in (ITSl l. Since such a code is an exact repair code, the 
same helper nodes that can repair a failed node at time r = 1 
can be used to repair the same failed node at any other time 
T. Specifically, instead of having DT-{{Fi]l^^,{U that 
changes over time, we can simply set 

The reason is that in exact repair the packets on the nodes 
are the same at any time, so we can reuse the helper choice 
in time 1 and the resulting new code should still meet the 
reliability requirement. Therefore, the considered exact repair 
LRRC with DHS can be converted to an exact repair LRRC 
with SHS with the same tradeoff curve (fTSl) . This, however, 
yields a contradiction with Proposition [T] that states that with 
SHS we cannot protect a hie of size larger that that in (ITsT l. 

If we compare the tradeoff curve (ITSl l of functional repair 
and the tradeoff curve (ITSl) of the best possible exact repair, 
see Pig. [D it is clear that the superiority of functional repair 
over exact repair occurs in both the MSR and MBR points. 
Hence, the proof is complete. ■ 

We can see that the proof of Proposition |9| provides a new 
simple proof technique that can show that exact repair cannot 
achieve the performance of functional repair under LRRCs by 
designing a DHS scheme that strictly outperforms all SHS 
schemes. 
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VI. When Can DHS/SHS Outperform BHS? 

In this section, we prove Proposition |2 which focuses 
on answering the question: Given {n,k,d,r) values, whether 
there exists a DHS/SHS scheme that outperforms the baseline 
BHS scheme. 


A. The (n, k, d, r) Values For Which BHS is Optimal 

For easier reference, we reproduce the converse part of 
Proposition |3] as the following proposition. 


Proposition 10: \f k < 
DHS scheme A, we have 


1—d—r 


then for any arbitrary 


k-l 

min min mincutrjfs,/) = > minifd —a). (20) 
GeeAteDC(G) ^ 


Specifically, even the most intelligent helper selection will 
have the same tradeoff curve (|6|l as BHS. 

Before presenting the proof of Proposition [TOl we introduce 
the following definition and lemma. 

Definition 4: A set of m active storage nodes (input-output 
pairs) of an IFG is called an m-set if the following conditions 
are satisfied simultaneously, (i) Each of the m active nodes has 
been repaired at least once; and (ii) Jointly the m nodes satisfy 
the following property: Consider any two distinct active nodes 
X and y in the m-set and without loss of generality assume 
that X was repaired before y. Then there exists an edge in the 
IFG that connects Xout and y^n- 

In some way (if grouping the two nodes Uin and Uout together 
as a single node), an m-set can be viewed as the generalization 
of the m-clique for the IFGs. Notice that a triangle in a graph 
is also a 3-set according to the above definition. Now, we start 
the proof by stating the following lemma which is the core of 
the proof. 

Lemma 2: Fix the helper selection scheme A. There exists 
an IFG G G QA{n,k,d,r,a, fi) satisfying that at least one 


n—d—r 


-set exists in its set of active nodes. 


Proof of Lemma \2} We prove this lemma by explicit 
construction. Start first with a graph such that all its active 
nodes have been failed/repaired before. Define V as the set of 
active nodes of corresponding to physical storage nodes 
{1,2,..., r} where r is the system parameter that limits the 
maximum number of temporarily unavailable nodes. Now, fail 
and repair the nodes {r + l,r + 2 ,... ,n} in this order with 
V being the set of unavailable helper nodes fixed for all times 
of repair, i.e., we fail node (r -f 1) first and we repair it under 
the fact that the nodes in set V are all unavailable, then we 
fail node (r -t- 2) and repair it with V being unavailable too 
and so on. The final IFG we get is denoted as graph G. 


We prove that G has at least one 


-set by proving 


n—d—r 

the following stronger claim: Consider any integer value m > 
1. Denote the set of active nodes of G\V by There exists 
an m-set in every subset of {m — l){n — d — r) + 1 active 
nodes of 

We first describe how to use the above claim and then 
provide the corresponding proof. Since the G we consider has 
n active nodes in total, then |E‘^| = n — r. By solving the 
largest m value satisfying {m—l){n—d—r)+l < \V‘^\ = n—r, 


the above stronger claim implies that must contain a 
-set. Lemma 12] is thus proven. 

We now prove this claim by induction on the value of m. 
When m = 1, by the definition of the m-set, any group of 1 
active node in forms a 1-set. The claim thus holds naturally. 

Suppose the claim is true for all m < mo, we now claim 
that in every group of (mo — l){n — d — r) 4- 1 active nodes 
of there exists an mo-set. The reason is as follows. Given 
an arbitrary, but fixed group of (mo — l)(n — d — r)-|-l active 
nodes of we use y to denote the youngest active node 
in this group (the one which was repaired last). Obviously, 
there are (mo — l){n — d — r) active nodes in this group other 
than y. On the other hand, since any newcomer accesses d 
helpers out of the surviving nodes, during its repair, node y, 
when it was repaired, was able to avoid connecting to at most 
{n — r — 1) — d surviving nodes of V^. Therefore, out of the 
remaining (mo — l){n — d — r) active nodes in this group, 
node y must be connected to at least ((mo — l){n — d — r)) — 
{n — r — 1 — d) = (mo — 2){n — d — r) + 1 of them. By 
induction, among those > (mo — 2){n — d — r) + 1 nodes of 
V‘^, there exists an (mo — l)-set. Since, by our construction, y 
is connected to all nodes in this (mo — l)-set, node y and this 
(mo — l)-set jointly form an mo-set. The proof of the stronger 
argument and hence Lemma |2] is thus complete. ■ 

Proof of Proposition [7^ We now prove (l20l) . Consider 
an ILG G G Ga that satisfies Lemma |2| Since k < 
we can construct a data collector of G that connects to k 


nodes out of the nodes of the 


-set in G. Call this 


data collector to. If we focus on the edge cut that directly 
separates source s and the k node pairs connected to to, one 
can use the same analysis as in lEl Lemma 2] and derive 
“mincut(s, to) < min((d—/)+/3, a)” for the given G € 

Ga and the specific choice of fg- Therefore, we have 

k-l 

min min mincutf;(s,/) < > naniid — i)'^B,a). (21) 
GGeAteDC(G) 1 / 

On the other hand, by definition we have 

min min mincut( 7 (s, f) > min min mincut( 7 (s, f). 
GeeAteDC(G) GGetGDC(G) 

( 22 ) 

Then by (1211 1. (l22T i. and (|5]l, we have proven that whenever k < 
’ equality (l20l l is true. Hence, the proof is complete. 


B. The Achievability Proof: Description of a New Helper 
Selection Scheme 


Lor easier reference, we reproduce the achievability part of 
Proposition [3] as the following proposition. 

then there 


Proposition 11: If min(d -f 1, fc) > 
exists an SHS scheme and a pair of (a, /3) such that 


'i—d—r 


k-l 

min min mincutG(s, /) > > uux\{{d — i)'^3, a). (23) 
GGeAteDC(G) 

We prove the above result by explicit construction. In this 
subsection we will first describe how we choose the helpers 
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Fig. 3. The MFHS scheme for (n, d, r) = (8, 4,1) and the illustration of the repair process of each of the 8 nodes. Each newcomer may choose to access 
(d + r) = 5 helpers, as illustrated in the arrows. Flowever, only d of them will be actually accessed since we assume r = 1 of the helpers may be temporarily 
unavailable. 


and we will analyze its performance in the next subsection. 
The proposed scheme is called modified family helper selection 
(MFHS) scheme, which is based on the family helper selection 
(FHS) scheme in Q-IH that was originally devised for the 
case of r = 0, i.e., all helper nodes are always available. 

The MFHS can be described as follows. First, we arbitrarily 
sort all storage nodes and denote them by 1 to n. Then, we 
define a complete family as a group of {n — d — r) physical 
nodes. The first {n — d — r) nodes are grouped as the first 
complete family, the second {n — d — r) nodes are grouped 
as the second complete family and so on. In total, we have 
complete families. The remaining n mod (n — d — r) 
nodes, if there is any, are grouped as an incomplete family. For 
any node F, if F belongs to a complete family, we use D{F) 
to denote the set of nodes outside the family of F. Since the 
family of node F has (n — d—r) nodes, D{F) contains exactly 
n — (n - d — r) = d + r nodes. If F belongs to an incomplete 
family, we use D(F) to denote the set of nodes from the first 
node to the (d + r)-th node (recall we sorted the nodes in the 
very beginning). Again D{F) contains exactly (d + r) nodes. 

One can view the set D(F) as the candidate helper set when 
node F fails. Specifically, when node F fails and nodes in U, 
\U\ < r, are unavailable, we choose the helper set of node F 
by D(F, U) = D(F)\U. Note that we will have |Z)(F)\(7| = 
(d + r) — r if \U\ = r and U C D(F). If the unavailable 
node set is smaller |[/| < r or if the unavailable node set is 
not all within D(F), then we simply let node F access the 
first d available nodes (those with the smallest node indices) 
in D(F) for repair. 

For example, suppose that (n,d,r) = (8,4,1). There are 
2 complete families, {1,2,3} and {4,5,6}, and 1 incomplete 
family, {7,8}. See Fig. [3] for illustration. Then suppose node 
4 fails. Since node 4 belongs to a complete family {4,5,6}, 
D(A) = {1,2, 3, 7,8} since nodes 1, 2, 3, 7, and 8 are outside 
the family of node 4. Therefore, if node 2 is temporarily 
unavailable U = {2}, the newcomer will then access nodes 
D(4f\U = {1,3,7,8} for repair. If it is node 8 being 
unavailable, then the newcomer will access Zl(4)\{8} = 
{1, 2,3, 7} for help. Similarly, if node 7 fails, then since node 
7 belongs to an incomplete family {7,8}, the corresponding 


candidate helper set contains the first (d + r) = 5 nodes 
D(7) = {1,2, 3,4, 5}. Ifjiode 2 is unavailable (U = {2}), 
then the helpers become D(7)\U = {1,3,4, 5}. If, say node 
8 is unavailable (U = {8}), the set D(7)\U = {1,2,3,4, 5} 
now contains 5 > d = 4 nodes. In this scenario, we simply 
let the newcomer access {1, 2, 3,4}, the first d = A nodes of 
D(7), for repair. 


C. The Achievability Proof: Analysis of The Modified Family 
Helper Selection 

In the following, we analyze the performance of the mod¬ 
ified family helper selection scheme (MFHS). Before we 
analyze the performance, we introduce some useful definitions. 

Definition of the family index vector: Notice that the MFHS 
scheme has in total —^— families, which we index from 


1 to 


'i—d—r 


However, since the incomplete family has 


different properties from the complete families, we index the 
incomplete family by the family index 0. The family indices 
thus become from 1 to c = 


n—d—r 


J and then 0, where c is 


the index of the last Complete family. If there is no incomplete 
family, we omit the index 0. Moreover, notice that any member 
of the incomplete family has D(F) = {1, • • • ,d + r}. That 
is, for an incomplete family node F, D(F) contains all the 
members of the first (c—1) complete families and only the first 
(d + r) — (n — d — r) (c — 1) = n mod (n — d — r) members of 
the last complete family c. Among the (n — d — r) members in 
the last complete family, we add a negative sign to the family 
indices of those who will “not” be helpers for the incomplete 
family. 

We use the notation FI (no) to denote the family index 
of node no. We can now list the family indices of the n 
nodes as an n-dimensional family index vector defined as 
(FI(1), FI(2),..., FI(n)). Considering the same example 
above where (n,d,r) = (8,4,1), th& family index vector is 
( 1 , 1 , 1 , 2 , 2 ,- 2 , 0 , 0 ). 

Definitions of the family index permutation and RFIP: 
A family index permutation is a permutation of the family 
index vector, which we denote by tt/. Using the previ¬ 
ous example, one instance of family index permutations is 
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tt} = (1,2,0,1,2,0,1,-2) 

Fig. 4. The construction of the RFIP for (n, d, r) = (8, 4,1). 


TTf = (1,1,0, 2, 0, —2,1, 2). A rotating family index per¬ 


mutation (RFIP) 
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special family index permutation 


that puts the family indices of the family index vector in an 


(n — d—r) X 
it row-by-row. 


1 > 

n—d—r \ 

'■ Fig. g] i 


— I table column-by-column and then reads 
illustrates the construction of the RFIP 


for the case of (n,d,r) = (8,4,1). The input is the family 
index vector (1,1,1, 2,2, —2,0, 0) and the output RFIP tt^ is 
( 1 , 2 , 0 , 1 , 2 , 0 , 1 ,- 2 ). 

We now analyze the performance of the MFHS scheme. 

Proposition 12: Consider any given MFHS scheme F with 
the corresponding IFGs denoted by Qp{n,k,d,r,a, [3). We 
have that 


min min mincut( 7 (s, f) = 
GeSf.ieDC(G) 

k 

min > min((d — 


y,(7’'/))+/3,a). 


(24) 


where can be any family index permutation and yi{TTf) is 
computed as follows. If the i-th coordinate of ttj is 0, then 
yi{'Kf ) returns the number of j satisfying both (i) j < i and 
(ii) the j-th coordinate > 0. If the i-th coordinate of tyj is 
not 0, then yi{TTf) returns the number of j satisfying both (i) 
j < i and (ii) the absolute value of the j-th coordinate of 
TTf and the absolute value of the i-th coordinate of tt/ are not 
equal. For example, if TTf = (1, 2, —2,1,1, 0,0,1, 2, —2), then 
2/6(tt/) = 4 and 2 /io( 7 r/) = 6 . 

The proof of this proposition will be provided in the end of 
this subsection. 

We notice that computing the right-hand side of (l24li re¬ 
quires searching over all possible permutations tt/. The fol¬ 
lowing proposition shows that when focusing on the minimum 
bandwidth repair (MBR) point, one can further simplify the 
expression. 

Proposition 13: Consider any (n, k, d, r) values and the 
MFHS scheme. The MBR point of the MFHS scheme is 


GMBR = rf/^MBR = 


dM 

Eli(rf-2/.(^}))+ 


(25) 


where ttj is the rotating family index permutation (RFIP). 
The proof of Proposition [T3] is relegated to Appendix |E] 
Proposition [Ts] directly implies Proposition [TT] the achiev- 
ability part of Proposition [3] The reason is as follows. We 
first notice that by the definition of yi{-), we always have 
yii'^f) < * — 1- Suppose min((i -f l,fc) > 


n—d—r 


consider the MFHS scheme. Since there are exactly n-d-r 
number of families (including both complete and incomplete 


and 


families), among the first min(d -f l,fc) indices of a family 
index permutation there is at least one family index that is 
repeated. Say the ji-th and the j 2 -th coordinates of ttj are of 
the same value where j'l, j 2 < min(d -f 1, k). Without loss of 
generality, we assume ji < j 2 . Then, by the definition of yi{-), 
we have 2 /j 2 ('^/) < 2/2 — 1 with a strict inequality since the 
ji-th coordinate of ttj will not contribute to 2/^2 ('^/)- Letting 
q = min((i -f 1, k), we thus have 

'^{d-yi{TTf))+= J2id-y^{^^f)) + 

i=l i=l i=q+l 

(26) 

q k 

> {d-{i- 1))+ 

(27) 

k 

= Y{d-{i-l))+. (28) 

i=l 

where ( |26] | follows from yi{TTf) < (i — 1) so that we can 
remove the ()+ when / = 1 to min((i-|-l, k) without changing 
the value; (l27T l follows from that 2/j2('^/) < (j 2 — 1) and 
that yi{TTf) < (j — 1) for arbitrary z; and ( l28T l follows from 
{d — (i — 1)) > 0 for all z = 1 to min((i + 1, k). 

We now compare the MBR points of the MFHS and 
the BHS scheme. The MBR point of the MFHS scheme is 
described by ( l25l l while the MBR point of the BHS scheme 
is described by 

GMBR = dPuBR = fc . (29) 

(*-1)) + 

Ineq. (l28l l then implies that the MFHS strictly outperforms 
BHS by having strictly smaller storage/BW since (l25l) is 
strictly less than ( |29] |. 

It is worth mentioning that the result in Proposition [T2| 
is weaker than the results in Section IV-BI in the following 
sense. The storage-bandwidth tradeoff curve in Proposition [T2| 
is based purely on a min-cut analysis similar to those in Q, 
while relying on the assumption that random linear network 
coding (RLNC) with sufficiently large finite fields can attain 
the min-cut capacity for infinitely many IFGs. Also see the 
discussion in 1231 . In contrast, the code existence result 
in Section IV-BI is in the strongest sense since we provide 
explicit binary code construction and then directly analyze its 
performance, see Proposition |8| without using any min-cut 
analysis. 

For some class of (n, k, d, r) combinations, it is possible 
to derive explicit code constructions without relying on the 
RLNC-based assumption. The code constructions are rather 
involved and for that reason we omit them from this work 
and provide them in ||2l. 

We close this subsection by providing the proof of Propo¬ 
sition [T2| 

Proof of Proposition [72} Recall that the MFHS scheme 
specifies the helper candidate set D{i) for nodes z = 1 to n 
based on the concepts of complete and incomplete families. In 
the following discussion, we assume that the helper candidate 
set D{i) is generated by the given MFHS scheme. 
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Using the same proof technique of |I3 Proposition 5] and 
ll?] Lemma 2], we can get the following lower bound on the 
smallest possible mincut of an IFG generated by A 

min min mincut(s,f) > 

GgSa teDC(G) 

k 

minY^ min((d — 2:i(p))'''/3, a), (30) 

pgp ^ 

where p is a fc-dimensional integer-valued vector, V = 
{(pi,P 2 , • ■ • ,P/c) : Vi e {1, • • • < Pi < n} and 

Ziip) = \{P 3 : j < hPj & D{Pi)}\- For example, 

suppose {n,k,d,r) = (6,4,2,1), D{3) = {1,4,5}, and 
p = (1,2,1,3). Since p 4 = 3, we have Z 4 (p) = \{pj : j < 
4:,pj S D(3)}| = 1. (The double appearances of pi = pa = 1 
are only counted as one.) 

The main intuition behind (l30l l. is that for any source s and 
data collector t, we consider the min-cut (V^V^) separating 
t from s. That is, V and form a partition of the nodes 
in the IFG; s € V and t € and the edges from V to 
is the minimum edge cut. Since t € set contains 
at least k intermediate nodes (the nodes t). Denote the k 
oldest intermediate nodes in by ui to Uk- We denote the 
node index of each intermediate node Ui hy NI{ui). Note that 
some Ui and Uj may have the same index NI{ui) = NI{uj) 
since Ui and Uj are intermediate nodes in the IFG, not the 
actual physical nodes. We then choose the p vector by p = 
iNIiui),NIiu 2 ),--- ,NIiuk)). 

With the above construction, if we examine the definition of 
Zi{p) in ( l30b , we can easily see that the function Zi(p) returns 
an upper bound of the number of edges entering Ui in in the 
IFG from some Uj^out satisfying j < i. Therefore, (d—Zi(p))+ 
represents a lower bound of the number of edges entering Ui^in 
that are not from Uj^out with j < i. Therefore, each m will 
contribute at least min((d—2;i(p))+/3, a) to the min-cut value. 
By summing over all Ui, we have (l30l l. Since the analysis is 
quite standard, see ||2l Lemma 2], we omit the detailed proof 
of {soil. 

Next, we will prove that 

min min mincut( 3 (s, f) < 

GgGf teDC(G) 

k 

miny^min((d - ?/i(7r/))+/3, a). (31) 

Vtt/ “—^ 

The reason is the following. Denote the smallest IFG in 
Gpin, k, d, r, a, /3) by Gq. Specifically, all its nodes are intact, 
i.e., none of its nodes has failed before. Denote its active 
nodes arbitrarily by 1,2,-- - ,n. Consider the family index 
permutation of the MFHS scheme F that attains the minimal 
value of the right-hand side of OTl i and call it if/. Find a vector 
of node indices p such that (i) FI{pi) = 7f/(f) for f = 1 to 
n and (ii) pi ^ pj if i ^ j. This is always possible since the 
family index permutation if/ can be viewed as transcribing 
some node index vector p to the corresponding family indices. 

After constructing p, we fail each active node in 
(1, 2, • ■ • ,n} of Go exactly once starting by failing node pi 
to node p„. Along this failing process, at each step of repair, 
say we are now repairing pi, we choose the unavailable nodes 


Ui as follows. Among the {d + r) nodes in D{pi), we first 
sort them according to their locations in the node index vector 
p. Namely, if both nodes pj^ and pj^ belong to D{pi), then 
we say pj^ is ahead of p/^ if ji < / 2 - Once we have sorted 
the {d + r) nodes in D{pi), we let the last r nodes of D{pi) 
to be temporarily unavailable during the repair of node pi. 
Therefore, the helpers of the newcomer pi must be the first d 
nodes of D{pi). After repairing all n nodes according to the 
above description, we denote the final IFG as graph G'. 

We use the following example to demonstrate the 
above failing/repair process. Let {n,d,r) — (8,4,1) 

and suppose the minimizing family index permutation 
is iff = (1,2,1, —2,0,0,1,2). Then, a possible 

p is p = (1,4,2,6,7,8,3,5), which satisfies 

(FJ(pi), FJ(p 2 ), • • • ,FI{pn)) = iff. Using the permutation 
p, we fail nodes 1, 4, 2, 6, 7, 8, 3, and 5 in this sequence. 
To illustrate how we choose the unavailable node set Ui 
when failing node pi, consider the fourth repair operation, 
for which node 6 fails and we want to repair it. Recall that 
node 6 belongs to the second complete family {4,5,6}. 
Therefore, D{6) = {1,2, 3, 7, 8}. We sort 11(6) according to 
their locations in p and we thus have D{6) = {1,2, 7,8, 3}. 
Therefore, we assume U = {3} is unavailable and the helper 
nodes of node 6 are {1,2, 7,8}. Another example is when 
repairing node 8, i.e., the sixth repair operation. Since node 
8 belongs to an incomplete family, the corresponding helper 
candidate set is D{8) = {1,2,3,4,5}. After sorting, we 
have D{8) = {1,4,2,3,5}. Therefore, we make node 5 
to be temporarily unavailable when repairing node 8, and 
the actual helpers of node 8 become {1,2,3,4}. Note that 
p may not be unique in our construction. For example, 
p = (3, 5, 2,6, 8, 7,1,4) is also a possible permutation 
satisfying {FI{pi), FI{p2), ■ ■ ■ , FI{pn)) = iff. Our 
construction holds for any arbitrary choice of p. 

Consider a data collector t in G' that connects to the oldest 
k newcomers, i.e., nodes pi to pk. We now analyze the cut- 
value between the root s and the data collector t using the same 
arguments as in I?] Lemma 2]. Consider a special cut (U, U°) 
between t and s that are constructed as follows. Initially, we 
set = {1} containing only the data collector. Then, for each 
i e {1,..., fc}, if a < (d—t/i(7r/))+/3 then we add to V^. 
Namely, the out half of node pi is added to V^', Otherwise, we 
include both Xo}), and a:f{ in V^. With the above construction 
of it is not hard to see that the cut-value of the cut (V, V^) 
is equal to min((d - yi{iff ))+p,a). 

Since the LHS of {31} further takes the minimum over Gf 
and all data collectors t, we have proven the inequality OTT l. 

Thus far, we have that 


k 

min > minf(d 
1—1 


^*(p))^/3,a) < 


min min mincutG(s, f) < 
G^Gf teDC(G) 


mm > 
Vtt, ^ 


min((d-?/i( 7 r/))+/ 3 , a). 


(32) 
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The remaining step is to prove that 

k 

min> min((d — 2i(p))+/3, a) > 

1 — 1 

k 

min^min(((i-?/i(7r/))+/3, a). (33) 

Once we prove (l3^ . we have (l24ll since (l32l l is true. We prove 
(|3^ b y showing that for any p € 7^ we can find a iTf such 
th^ 

2 :*(p) < 2/*(%), Vi = l,---,k. (34) 

One can clearly see that the existence of tt/ satisfying (l34l l 
for any p G 7^ immediately implies (l3^ . 

In our previous work ||3], lH, we have proven that (l34l i 
holds for the case of r = 0 and arbitrary (n, k, d) values. We 
will now prove that (l34l l holds for arbitrary (n, k, d, r) values. 

A closer look at the definition of 2 :i( ) in the proof of Propo¬ 
sition [12] shows that different parameter values (n, k, d, r) and 
(n'j k',d',r') can still lead to the same function Zi{-) for i = 1 
to n, provided we have n = n' and d + r = d' + r'. The reason 
is as follows. Suppose we apply the MFHS scheme to two 
different scenarios {n,k,d,r) and {n', k',d',r'). If we have 
n = n', then the total number of nodes is the same for both 
scenarios. Since each complete family contains n — {d + r) 
nodes, if we also have (d + r) = (d' + r'), then MFHS 
will divide the nodes into families in the same way for both 
scenarios. Since in MFHS, a newcomer requests help from 
outside its own family, the helper candidate set D{i) will again 
be the same for both scenarios. Since the definition of Zi{-) 
in the proof of Proposition [12] depends only on the helper 
candidate set D{j), the Zi{-) function will be identical in both 
scenarios. 

We now argue that if two scenarios {n,k,d,r) and 
{n',k',d',r') satisfying n = n' and d + r = d' + r', the 
yi{-) function in Proposition [T2l will again be the same for 
both scenarios. The reason is as follows. By comparing the 
definitions of yi{TTf) and Zi{p), one can quickly see that if 
we choose the family index permutation tt/ and the p that 
satisfy tt/ = {FI{pi), ■■■ , FI{pn)), then 

y,{TTf) = Zi{p), = (35) 

Namely, yi{TTf) can be viewed as a transcribed version of 
Zi{p) from the node index p to a family index tt/ if p G 7^. 
Since we have shown that Zi{-) will be the same for both 
scenarios and since when the (d + r) = (d' + r') the node 
index to family index transcription will be identical for both 
scenarios, yi{-) will also be identical for both scenarios. 

Consider any arbitrarily given {n,k,d,r) and use it to 
generate another scenario {n', k', d',r') satisfying n' = n, 
k' = k, d' = d + r and r' = 0. Consider an arbitrarily 

**We note that nf does not necessarily have to satisfy 
TTf = FI(p 2 ), ■ ■ ■ , FI{pn)), and in fact -kf = 

{FI{pi), FI{p 2 ), ■ ■ ■ ,FI{p„)) is not always possible. For illustration, 
consider p = (1,1,1,1, • ■ ■ , 1), which is a legitimate choice of p € "P. 
However, for such p it is impossible to find a family index permutation 
satisfying kf = (FI{pi), FI{p 2 ), ■ ■ ■ ,FI{p„)) since the vector 
{FI{pi), FI{p 2 ), ■ ■ • , Flifp-n)) is not a family index permutation. 


chosen p G 7^. For the scenario of (n', k', d', r'), since r' = 0, 
our previous results in 0 , 0 show that there exists a if/ 
satisfying Ol- Since the above paragraphs have proven that 
the functions Zi{-) and yi{-) are identical for both scenarios 
{n,k,d,r) and {n', k', d',r'), the if/ that satisfies (l34l i for 
{n', k',d',r') must also satisfy (El for the given (n, k, d, r) 
as well. The proof of Proposition fT2] is thus complete. ■ 

VH. Conclusion 

We have shown that stationary helper selection (SHS) can 
be strictly suboptimal by carefully constructing an optimal 
binary code for (n, k, d, r) = (5, 3, 2,1) and (5,4, 2,1) based 
on dynamic helper selection (DHS), where r represents the 
maximum number of nodes that can be temporarily un¬ 
available. For general {n,k,d,r) values, we have answered 
the question whether SHS/DHS can outperform blind helper 
selection (BHS) or not, for a vast majority of {n,k,d,r) 
values. The results thus provide valuable guidelines for each 
{n,k,d,r) whether it is beneficial to spend time and design 
new SHS/DHS schemes or whether one should simply use the 
basic BHS. 

Appendix A 

The Information Flow Graph 

We provide in this appendix the description of the infor¬ 
mation flow graph (IFG) that was first introduced in C). This 
appendix follows the same IFG description as in 0 . 



Fig. 5. An example of the information flow graph with (n, fc, d) = (4, 2, 2). 

As shown in Fig [5] an IFG has three different kinds of 
nodes. It has a single source node s that represents the source 
of the data object. It also has nodes x\^ and that represent 
storage node i of the IFG. A storage node is split into two 
nodes so that the IFG can represent the storage capacity of 
the nodes. We often refer to the pair of nodes x\^ and 
simply by storage node i. In addition to those nodes, the IFG 
has data collector (DC) nodes. Each data collector node is 
connected to a set of k active storage nodes, which represents 
the party that is interested in extracting the original data object 
initially produced by the source s. Fig. [5] illustrates one such 
data collector, denoted by t, which connects to fc = 2 storage 
nodes. 

The IFG evolves with time. In the first stage of an in¬ 
formation flow graph, the source node s communicates the 
data object to all the initial nodes of the storage network. We 
represent this communication by edges of infinite capacity as 
this stage of the IFG is virtual. See Fig. [5] for illustration. This 
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stage models the encoding of the data object over the storage 
network. To represent storage capacity, an edge of capacity a 
connects the input node of storage nodes to the corresponding 
output node. When a node fails in the storage network, we 
represent that by a new stage in the IFG where, as shown 
in Fig. |5] the newcomer connects to its helpers by edges of 
capacity /3 resembling the amount of data communicated from 
each helper. We note that although the failed node still exists 
in the IFG, it cannot participate in helping future newcomers. 
Accordingly, we refer to failed nodes by inactive nodes and 
existing nodes by active nodes. By the nature of the repair 
problem, the IFG is always acyclic. 

Given an IFG G, we use DC(G) to denote the collection 
of all data collector nodes in G Q. Each data collector 
t € DC(G) represents one unique way of choosing k out of 
n active nodes when reconstructing the file. 

Appendix B 

Proof of Proposition[5] 

The statement that BHS is optimal if Coll holds is a 
restatement of Proposition [3] We now prove that under the 
additional assumption d = r = 1, the BHS scheme is optimal 
if either A: = 3 or if fc = 4 and n mod 3 7 ^ 0. 

We first give the following definition of an m-tree that will 
be useful in our proof. 

Definition 5: Consider {n,k,d,r) such that d = 1. Con¬ 
sider an IFG G and a set of m active nodes of G denoted 
by ..., x™. The set of m active nodes {x^,..., x"*} 

is said to be an m-tree if the following two properties hold 
simultaneously, (a) For i,j€{l,2---,m — 1} and j > i, x* 
is repaired before x-^ ; (b) for any z = 2 to m, there exists a 
node & g {1, • • • , z — 1} such that (x^m, is an edge in G. 
The reason that we call the above m nodes an m-tree is 
because since d = 1, there is exactly 1 edge entering each 
node X;^. The above condition (b) thus implies that each node 
x\^ is connected to one of the previous nodes x^^; to x*”/. 
Therefore, these m nodes form a tree. 

We first consider the case of fc = 3, and we state the 
following claim. 

Claim 1: Consider {n,k,d,r) parameters that satisfy that 
d = r = 1. For any given DHS scheme A and the correspond¬ 
ing collection of IFGs Qa, we can always find a G* G Ga such 
that there exists a 3-tree in its set of active nodes. 

We now use the above claim to prove BHS is optimal 
if fc = 3. Suppose the above claim is true. We let t* 
denote the data collector that is connected to the 3-tree. By 
properties (a) and (b) in Definition |5] of a 3-tree, we can 
see that node x^ is a vertex-cut separating source s and 
the data collector t*. The min-cut value separating s and 
t* thus satisfies mincutc* (s, f*) = min(/3, a) for the given 
G* g Ga and the specific choice of t*. Also note that 
min(/?, a) = min(((i—z)+/3, a) since we assume d = 1 

and fc = 3. Combining both, we thus have mincutc* (s, t*) = 
min(((i — z)’*'/?, a). By the BHS tradeoff curve formula 
in (|5]l, we thus have that BHS is optimal when k = 3 holds. 

Proof of Claim Q} We prove Claim [T] by explicit 
construction. Start from any G G Ga with all n nodes have 


been repaired at least once. We choose one arbitrary active 
node in G and denote it by We let fail and 

denote the newcomer that replaces by z/^^^. The helper 
selection scheme A will choose a helper node (since d = 1) 
and we denote that helper node as x^^^. The new IFG after 
this failure and repair process is denoted by By our 

construction x^^\ as an existing active node, is repaired before 
the newcomer and there is an edge (Xout\z/i„ ) in G^^^. 

Now starting from G^^^, we choose another which 

is not one of x^^^ and and let this node fail. We use 
y^^^ to denote the newcomer that replaces The helper 

selection scheme A will again choose a helper node based 
on the history of the failure pattern. We denote the new IFG 
(after the helper selection chosen by scheme A) as G^‘^\ 
If the helper node of is x^^^ or y^^\ then the three 
nodes are the (x^,x^,x^) nodes satisfying 

properties (a) and (b) of Definition |5] of a 3-tree. In this case, 
we can stop our construction and let G* = G^^^ and we say 
that the construction is complete in the second round. Suppose 
the above case is not true, i.e., the helper of is neither x*^^^ 
nor y^^\ Then, we denote the helper of y^^^ by x^‘^\ Note that 
after this step, G*^^^ contains two disjoint pairs of active nodes 
such that there is an edge yin™^) in G*^^^ for m = 1,2. 

We can repeat this process for the third time by failing a 
node that is none of : Vm = 1,2}. Again, let 

y(3) denote the newcomer that replaces and the scheme 
A will choose a helper for y^^'i. The new IFG after this failure 
and repair process is denoted by If the helper of y^^'i 

is x*-™^ or y(™) for some m = 1 ,2, then the three nodes 
(x(™\ y(™\ y(3)) 

are the ( tJj ^ tij ^ tij nodes in Definition |5] 
satisfying properties (a) and (b). In this case, we can stop 
our construction and let G* = G*-^^ and we say that the 
construction is complete in the third round. If the above case 
is not true, then we denote the helper of y^^^ by and 
repeat this process for the fourth time and so on. 

If the construction is not complete in the m-th round for 
some m < — 1, we can always start the {m -f 1)- 

th round since out of the n nodes, we can always find 
another node that is none of {x^"* \ y^™' ^ '■ Vm' = 

1,2,..., m}. Now, suppose that n is odd and the construc¬ 
tion is not completed after mo = [f] ~ 1 rounds. In 

this case, there is only 1 remaining node that is not inside 
{x^^^y*-™^ : Vm = 1,2,...,mo}. Denote that node as 
z(;("*°+i). Fail and replace it by y(™o+i)^ Since 

{x*'"*^y*-™^ : Vm = 1,2,...,mo} and y(™o+i) cover all n 
nodes, the helper node of y(™o+i) must be one of the nodes 
in {x^"*\y*-"*^ : Vm = 1,2,..., mo}. If the helper node of 
y{mo+i) ^(m ) Qj. y{m ) gome m' = 1,2,..., mo, then 
the three nodes (x^"* \y^"^ y(™o-i-i) j form a 3-tree satisfying 

properties (a) and (b) of Definition |5] 

For the case when n is even and and the construction is 
not completed after mg = [§] — 1 rounds. In this case, there 
are 2 remaining nodes that are not inside {x^™\ y^™^ : Vm = 
1,2,..., mo}. Choose arbitrarily one of them and denote that 
node as zui^o+i). replace it by y("*o+i) 

while having the other remaining node (the one that is not 
zz;("*o+i)) temporarily unavailable when repairing y(™o+i). 
Therefore, we have forced y("*o+i) to connect to an x^™ ^ or 
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) node for some m' = 1,2,, toq. Similar to the case 
for which n is odd, the three nodes (a;(™ ), 
form a 3-tree. The proof of Claim [T] is complete. 

■ 

Now, we turn our attention to the case when k — A and 
n mod 3^0. Similarly, we state the following claim. 

Claim 2: Consider {n,k,d,r) parameters that satisfy that 
d = r = 1 and n mod (3) ^ 0. For any given DHS scheme A 
and the corresponding collection of IFGs Qa, we can always 
hnd a G** € Ga such that there exists a 4-tree in its set of 
active nodes. 

We now use the above claim to prove BHS is optimal if 
k = A and n mod 3 7^ 0. Suppose the above Claim |2] is true. 
As we did above, we let t** denote the data collector that 
is connected to the 4-tree. By properties (a) and (b) of the 
dehnition of a 4-tree we can again see that node is a vertex- 
cut separating source s and the data collector t**. The min-cut 
value separating s and t** thus satishes mincutc**(s, f**) < 
min((i/3, a) = for G** € Ga and 

the specihc choice of t**, where the inequality, as discussed 
before, follows from being a vertex-cut separating s and 
t** and the equality follows from the assumption that d = 1 
and k = A. We thus have that BHS is optimal when k = A 
and n mod 3 7^ 0. 

Proof of Claim \2} We prove this claim by explicit 
construction. The construction contains 2 phases. The goal 
of Phase 1 is to convert all nodes to be either part of a 2-set 
or part of a 3-tree. We start from time 1, when no node has 
ever been repaired. Let V denote a subset of nodes. Initially, 
set V = %. We arbitrarily choose one node in {1, • • ■ , n}\14, 
say node w. We fail node w. The newcomer is denoted by 
y and we let x denote its helper. After repairing y, we add 
both {x, y} to the node set V. After updating V, we again 
choose arbitrarily a w S {1, • • • , n}\V, fail it, and replace it 
by a newcomer y with the corresponding helper being x. If 
the helper x is already in V, then we add y to V. If x is not 
in V, we add both {x, y} to V. Repeat the above process until 
{I,-- - ,n}\y = 0. 

Consider two possibilities. If the resulting IFG contains 
a 4-tree, then our construction is complete. If not, then we 
argue that all the nodes in V (n nodes in total when the 
construction terminates) must either be in a 2-set or in a 3-tree 
but cannot be in both. We prove this by induction. Suppose V 
contains only disjoint 2-sets or 3-trees during the construction. 
Consider our iterative construction, for which we choose a 
node w € {1, •'' replace it by a newcomer y 

with the corresponding helper being x. If x ^ V already, 
after adding {x, y} to V the new pair (x, y) will form a 2-set 
that is disjoint to all the previous nodes in V. The induction 
assumption holds. If x GV already, we claim that x must be 
part of a 2-set. The reason why x cannot be part of a 3-tree 
is that if so, then the 3-tree plus the newcomer y will form 
a 4-tree and we have already ruled out such a possibility by 
focusing on the case for which the construction does not lead 
to any 4-tree. 

We are now ready for Phase 2. Recall that after Phase 1 all n 
nodes have been partitioned to be a collection of disjoint 2-sets 
or 3-trees. Pick arbitrarily a 2-set in the active nodes of 


This is always possible since n mod 3 7^ 0, which implies that 
the n nodes cannot all be 3-trees. Denote the chosen 2-set by 
{v, w). Fail node w and during its repair let v be unavailable. 
Call the newcomer w'. If w' connects to a node that belongs 
to a 3-tree, then the 3-tree and the newcomer w' form a 4-tree. 
The construction is thus hnished/terminated. 

If w' connects to a node that belongs to a 2-set, then the 
2-set and w' now form a 3-tree. Namely, we have converted 
w, part of a 2-set, to a new node w' being part of a 3-tree. We 
then fail node v and replace it by a newcomer v'. Similarly, 
if v' connects to a node that belongs to a 3-tree, then the 3- 
tree and the newcomer v' form a 4-tree. The construction is 
hnished/terminated. If v' connects a node that belongs to a 2- 
set, then the 2-set and v' now form a 3-tree. Specihcally, we 
have converted v, part of a 2-set, to a new node v' being part 
of a 3-tree. One can see that the above procedure removes the 
2-set {v, w) from the IFG and replaces it by v' and w' that 
participate in two different 3-trees. 

We then iteratively repeat the above process to convert all 
2-sets into 3-trees. This is always possible since n mod 3 7^ 0, 
which implies that the n nodes cannot all be 3-trees. Nonethe¬ 
less, we cannot repeat this process indehnitely since each 
round will remove one 2-set and we only have hnitely many 

2- sets. This implies that the process must terminate after some 
hnite rounds. Specihcally, either w' or v' will be connected to a 

3- tree and we will have a 4-tree in the end of this construction. 

The proof of Claim |2] is complete. ■ 

Thus far, we have proven the converse part of Proposition |5] 
that BHS is optimal when conditions (i)-(iii) are satished. 
Now, we notice that, under the assumption of d = r — 1, 
the statement “none of (i)-(iii) holds” is equivalent to “at least 
one of the following conditions holds: (a) n mod 3 = 0 and 
A: = 4, or (b) k > 5.” The reason is by the simple observation 
that when focusing on d = r = 1, condition (i) is equivalent 
to “fc < 2” since .. = 2. Therefore, not satisfying (i) 


n—d—r 


to (iii) is equivalent to satisfying one of conditions (a) and (b). 

In the following, we will prove that there exists an SHS 
scheme that can outperform BHS when the (n, k, d, r) pa¬ 
rameters satisfy at least one of conditions (a) or (b). Suppose 
condition (a) is satished. We prove the existence of such SHS 
scheme by explicit construction. 

Our construction is as follows. Since n mod 3 = 0, we can 
divide n nodes into ^ groups of 3 nodes. Suppose the hie to 
be protected has size A4. We hrst divide the hie into 2 packets, 
each of size A4/2. We call each packet the systematic packet, 
which is analogous to the concept of systematic bits in error 
control coding. We then use an (f,2) MDS code to protect 
the systematic packets by adding f — 2 parity packets. Finally, 
each group of 3 nodes is associated with one distinct packet 
(can be either systematic or parity packets). Each packet is 
then duplicated 3 times and all 3 nodes in the same group 
will store an identical copy of the packet of that group. 

We argue that such a system can satisfy {n,k,d,r,a, (3) 
satisfying k = A, d = r=l, a = j3 = ^. The reason is as 
follows, a = ^ since each node only stores 1 packet of size 
Since fc = 4 > 3, any k nodes must belong to at least 
2 different groups and the nodes in these > 2 groups must 
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collectively contain > 2 distinct packets. Because we use an 
{^,2) MDS code to protect the file, one can reconstruct the 
original file by accessing any fc = 4 nodes. We now consider 
the repair operation. Suppose a node fails and we consider the 
other 2 nodes of the same group. Since r = 1, at least one of 
the other two nodes must still be available. The newcomer can 
thus ask the remaining available node to transfer the packet it 
stores to the newcomer. Therefore exact-repair can be achieved 
with d = r = 1 and (3 = 

We now compare the performance to a BHS scheme for the 
same {n, k, d, r) parameter value. By (|6ll, the tradeoff curve 
of BHS when d = r = 1 and k = A becomes 

k-l 

min((d — a) = min(^, a) > M (36) 

i=0 

One can clearly see that our parameter values a = P = ^ 
do not satisfy (l36l l. As a result, the above scheme strictly 
outperforms the BHS scheme. 

Suppose now that condition (b) is satisfied, i.e., k > 5. Our 
construction is almost identical to the scheme we described for 
condition (a). That is, depending on the values of n mod 3, 
we can either divide the n nodes into ^ group of 3 nodes; or 
— 1 groups of 3 nodes plus 1 group of 4 nodes; or — 2 
groups of 3 nodes plus 2 groups of 4 nodes. Regardless of 
which case we are in, we again divide the file of size A4 into 
2 packets, each of size Then we protect the two packets by 
an ([^J , 2) MDS code. Associate each group with one coded 
packet and let the nodes of each group store an identical copy 
of that packet. Since every group has at most 4 nodes, any 
fc > 5 nodes must belong to at least two different groups. 
Since any 2 packets can be used to recover the original file, 
the proposed scheme can reconstruct the original file from any 
fc nodes. By similar reasons as before, exact-repair can also 
be achieved with d = r = 1 and /3 = ^. We now compare 
the performance to a BHS scheme for the same {n,k,d,r) 
parameter value. By (|6]l, the tradeoff curve of BHS when d = 
r = 1 and fc = 5 is again (l36l l. The above scheme with a = 
P = ^ thus strictly outperforms the BHS scheme. The proof 
of Proposition |5] is hence complete. 


Appendix C 

The Gap Between ([Toll and dlTJ When r = 1 and d = 2 


We want to show in the following that, when r = 1 and 
d = 2, dl^ and (fTTll cover all the range of parameters that 
satisfy ([T]) except for the points {n,k,d,r) = (5, 3,2,1) and 
(5,4, 2,1). When r = 1 and d = 2, the LHS of (ITOl i becomes 


n—3 


and the LHS of (fTTl l becomes 


n—3 

= 1 and d 


Since d < n — 
2. In 


— r, we must have n > 4 when r 
the following we analyze the gap between the two conditions 


“fc < 
values. 


n—3 


and “min(d -|- 1, fc) > 


n—3 


for different n 


For n = 4, (fTOl i becomes fc < [y] =3 and (HB becomes 
min(3, fc) > [y] =4. By ([T]i, we must have that fc < n — 1 = 
3. Therefore, for the scenarios of n = 4, d = 2, and r = 1, 
all possible (n, fc,d, r) values satisfy (O and none of them 
satisfy (fTTll . 


For n = 5, (US becomes fc < = 2. By ([T]i, we must 

have that fc < n — 1 = 4. For fc = 1, 2, ( fTOl i is satisfied. On the 
other hand, fc = 3, 4 cannot satisfy ( fTOb . Since (fTTl l becomes 
min(3,fc) > [|] = 3, no fc value can satisfy (fTTl l. We thus 
have that points (n, fc, d, r) = (5,4, 2,1) and (5, 3, 2,1) satisfy 
neither (fTSl l nor (fTTl l. 

For n > 6, we first observe that 1 < < 2 

whenever n > 6. The reason is as follows. The first 2 strict in¬ 
equalities are straightforward. The last inequality follows from 
that is monotonically decreasing with n and = 2. The 
above observation thus ensures that when n > 6, JT^ becomes 


We can see that there is no gap between these two conditions. 
Therefore, for n > 6, all possible parameters satisfy either 

(Unii or dnii. 

We have thus shown that the only points that conditions (fTOll 
and (fTTl l do not cover are the points (n, fc, d, r) = (5,4, 2,1) 
and (5, 3, 2,1). 


= 2 and dn) becomes min(3, fc) > 


n—3 


= 2 . 


Appendix D 

Proofs of Lemma[T]and Proposition [7] 

In this appendix, we prove both LemmafTjand Proposition [7] 
simultaneously. 

The proof is by induction with the following two induction 
conditions: (A) if node i is the parent of node j, then jointly 
nodes i and j contain 3 linearly independent packets, (B) if 
neither f is a parent of j nor j is a parent of i, then jointly 
nodes i and j contain 4 linearly independent packets. Let 
time T = 0 be the stage where the storage network is still 
intact, i.e., no node has failed before. By checking all (2) 
parrs of nodes, we can easily see that the initial code in Fig. [2] 
satisfies the induction conditions (A) and (B) at r = 0. In the 
following, we first show that conditions (A) and (B) guarantee 
that we can always find packets (Z^,Z*) using the regular 
repair operations and that the induction conditions (A) and 
(B) will remain satisfied for the new code. 

Now, let us assume that the induction conditions (A) and 
(B) are satisfied until time r = tq — 1 and, using the same 
notation as above, node a is the failed node with nodes {b, c} 
selected as helpers by the CA scheme. We also use the same 
notation to denote the two packets in node i by 
We introduce some vector notation to aid in this proof. We 
first let m be a 4 X 1 column vector holding the 4 packets 
of the file such that = (Xi X 2 -A3 -A4). Since the 

file size is 4, we can express each coded packet by a 4 x 1 
column vector v over the binary field. Specifically, we denote 
the vectors of the packets in node i by (v) ,V2 ), which 
means that and = m^V2 ■ 

With the above vector notation, consider node b and denote 
the linear span of the vectors (v^^\v2^^) of node b by B. 
Specifically, B = span({vj^\ v^^^}) = {0, V2^\ ^ -(- 

V2^^}, where 0 is the zero vector. Similarly, denote by C, D, 
and E the span of the vectors of nodes c, d, and e, respec¬ 
tively. In the following, we give an equivalent mathematical 
presentation of the regular repair operations that choose the 
{Z^, Z*) packets based on the linear spans B, C, D, and E. 
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Choosing the packet: Using the above vector notation, 
how we choose can be rewritten as follows. If rank(C 0 
D) = 4 where © is the sum-space operator, then construct 
set Si = D. If rank(C © D) < 3, then construct set Si = 
C © D. If rank(C © E) = 4 then construct set S 2 = D. If 
rank(C © E) < 3, then construct set S 2 = C © E. Then, we 
choose arbitrarily a vector v;, G B\(Si U S 2 ) and then send 
Z^ = m^Vb. 

We now explain the reason why the above new code 
construction method is equivalent to the previous description 
of the repair operations. To that end, we notice that whenever 
Vf, ^ Si, then the coded packet Z^ will satisfy Condition 1 
in our construction. Similarly, whenever v;, ^ S 2 , the coded 
packet Z^ will satisfy Condition 2 in our construction. Since 
we choose v G i 3 \(SiUS 2 ), the Z^ will simultaneously 
satisfy both conditions. 

We will argue now that we can always hnd such a vector 
Vf,. To that end, we will first prove that regardless how we 
construct Si, i = 1,2, we always have rank(B n Si) < 1. 
Since the construction of Si are symmetric (one focusing on 
spaces C and D, and the other on spaces C and E), we prove 
only rank(BnSi) < 1. Consider two cases. Case 1: rank(C© 
D) = 4. In this case. Si = D, then we have 

rank(B n Si) = rank(B) + rank(D) — rank(B © D) 

< 2 + 2 - rank(B © D) (37) 

< 1, (38) 

where JjTl l follows from that the rank of the space of each 
node is at most 2, and (l38] l follows from that the induction 
conditions (A) and (B) imply that the rank of the sum space 
of two nodes is either 3 or 4, depending on whether one is 
the parent of the other. 

Case 2: rank(C © D) < 3. In this case Si = C © D and 
we have 

rank(B n Si) = rank(B n (C © D)) 

= rank(B) + rank(C © D) — 
rank(B © (C © D)) 

< 2 + 3-rank(B© (C©D)) (39) 

= 2 + 3-4 (40) 

= 1 , 

where (IWt follows from that rank(B) < 2 and rank(C © 
D) < 3, and (140b follows from the fact that because we use 
the Clique-Avoiding algorithm, among any the three nodes b, 
c, and d, at least two of them do not have the parent-child 
relationship. By the induction assumption (B), we must have 
rank(B © C © D) =4. 

The above arguments prove that rank(B n Si) < 1 for 
z = 1, 2. If we count the number of elements in B n Si and 
B n S 2 , then we must have |B fl Si| < 2^ = 2 for z = 1, 2. 
Therefore, the size of ((BnSi)U(BnS 2 )) is at most 3 since 
both B n Si and B n S 2 are linear subspaces and both thus 


contain the zero vector as a common element. As a result, 

|B\(Si U S 2 )| = |B| - |((B n Si) U (B n S 2 ))| 
>2^-3 

= 1 . 

Therefore, there exists at least one vector Vf, G B\(Si U S 2 ). 

Choosing the Z* packet: Using the above notation, how 
we choose Z* can be rewritten as follows. Recall that Vf, 
is the vector for the coded packet Z^. We argue that the 
construction of Z* in Section IV-BI is equivalent to the fol¬ 
lowing construction. That is, we choose arbitrarily a vector 
Vc G C\(vt,©D))U(vb©E)) and then send Z* = m^Vc. The 
reason that the above new code construction is equivalent to 
the previous description of the repair operations is as follows. 
Whenever, Vc ^ (vf, © D), then the coded packet Z* will not 
be a linear combination of Z^ and the two packets in node d. 
Similarly, whenever Vc ^ (v;, ©E), then the coded packet Z* 
will not be a linear combination of Z^ and the two packets in 
node e. The choice of Vc G C\(vb © D)) U (vf, © E)) thus 
satishes both conditions simultaneously. 

We argue now that we can always hnd such a vector Vc. 
Specihcally, we have that 

rank(C n (vf, © D)) = rank(C) + rank(vb © D) — 
rank(C © (vf, © D)) 

< 2 + 3 — rank(C © (v;, © D)) (41) 
= 2 + 3-4 (42) 

= 1 , 

where (EB follows from rank(C) < 2 and rank(vb © D) < 
3. Equation (l42l i is due to the following facts. If rank(C © 
D) = 4, then obviously we have rank(C © v;, © D) =4. If 
rank(C©D) < 3, then by the induction assumptions (A) and 
(B) we must have rank(C © D) = 3. In this case, we have 
Si = C © D when we construct v;,. Therefore, rank(C © 
Vf, © D) = rank(C © D) + 1 = 4. The above argument shows 
that rank(C n (v;, © D)) < 1. Symmetrically, we also have 
rank(C n (vf, © E)) < 1. By a verbatim argument as used in 
proving |B\(Si © 82 )! > 1, this implies that |C\((vf,©D) U 
(vb©E))| > 1. Therefore, we can always hnd such a v^. 

Thus far, we have proven that whenever the induction 
conditions (A) and (B) hold for time r = tq — 1, we can 
always carry out the code construction for time t = tq. We 
now argue that the induction conditions (A) and (B) also hold 
after we hnish the repair operation in time r = tq. Since only 
node a is repaired, we only need to check the node pairs (a, b), 
{a, c) to make sure they satisfy induction condition (A) and 
check node pairs (a, d) and (a, e) to make sure they satisfy 
induction condition (B). 

The newcomer a now has packets (Z^,Z*) and the span 
of the vectors in a is now A = span({vb, Vc}). By our CA 
helper selection algorithm, the helper nodes b and c do not 
form a parent-child relationship. By induction condition (B), 
rank(B © C) =4. Thus, any non-zero vector Vc G C is 
independent of the linear space B. Therefore, rank(A©B) = 
rank(vc © B) = 3. Symmetrically, rank(A © C) = 3. 
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To prove that rank(A © D) =4, we notice that since Vf, ^ 
Si and Si © D, we must have Vf, ^ D. Therefore, rank(vf,© 

D) = 3. Since Vc ^ (vf, © D), we have rank(A © D) = 
rank(vc © (v^ © D)) =4. Symmetrically, we have rank(A © 

E) = 4. We can thus see that the nodes satisfy the induction 
conditions (A) and (B) after the repair operations in r = tq. 

We have shown thus far by induction that we can always 
repair the network/code at any time using the regular repair 
operations. The above also shows that we can maintain the 
induction conditions (A) and (B) at any time. We are thus 
only left with showing that we can construct the whole file 
from any fc = 3 nodes. Pick any three nodes in the network. 
By the CA scheme, these three nodes do not form a triangle, 
i.e., at least one pair of nodes in these three nodes does not 
form a parent-child relationship. By induction condition (A), 
we have that the 4 packets on these two nodes are linearly 
independent and we can use those packets to construct the 
file. The proof is hence complete. 

Appendix E 

Proof of Proposition [13] 

To find the MBR point, we need to find the smallest /3 that 
satisfies 

k 

min ^min yi(7r/))+/3,a^ > A4, (43) 

i=l 

where a = oo since we are considering the MBR point. One 
can easily see that the minimzing j3, termed /?mbr, equals to 

^ , -;—. Therefore, what remains to be proven 

is that 

k k 

min^(d - y^{^Tf))+ = ^(d - y^{^T}))+. (44) 

^ i —1 i —1 

where tt^ is the RFIP defined in Section IVI-CI 

In our previous work ||3 Proposition 1], we have proven 
the following statements. For any (n, k, d, r) value satisfying 
r = 0 , we have 

k k 

^yi(7r/) < ^yi(7r}) (45) 

and yi^^nf) > yi^iiTf) if ii < 12 . (46) 

Namely, the the RFIP ttj maximizes the cumulative sum of 
2/1 (TTp to yk{T^*f) and the 2 /i(Tp value is non-decreasing with 
respect to i. 

We now argue that (l45T l and (l46T l hold for arbitrary 
(n, k, d, r) value with r > 0 as well. In the proof of Proposi¬ 
tion [T2| the paragraph right before (l35T l. we have established 
that the yi{-) function defined for one scenario {n,k,d,r) is 
identical to the yi{-) function defined for another scenario 
{n', k', d',r') if we have n = n', d + r = d' + r'. For any 
given {n,k,d,r) value, consider another set of parameters 
{n' ,k' ,d' ,r') such that n' — n, k' = k, d' = d + r, and 
r' = 0. Since ( l45l l and ( l46l l hold for any parameter values 
with r = 0 , they must hold for the case of {n',k',d',r') 
since by our construction we have r' = 0. On the other 
hand, by the arguments in the proof of Proposition [T2| the 


2 /i(-) functions for both (n, k, d, r) and (n', fc', d', r') must be 
identical. Therefore dlST l and (l46T l hold for the arbitrarily given 
(n, k, d, r) as well. 

We now use (l45T l and ( |46] | to prove (l44l i. Given any 
(n,k,d,r) value, we construct the corresponding yi(-) func¬ 
tion and the RFIP ttj. Then we define ko = max{a; G 
{1, 2, • ■ • ,k} : yx{'^*f) < d}. Namely, fco is the largest index 
X < k such that yx{'^*f) < d. Since by dSi 2/i(7rp is non¬ 
decreasing, we must have yi(jr*f) < d for all 1 < i < fco and 
y^{^^}) > d for 3.11 Ajq k. 

Consider any family permutation tt/, we now have 

k ko 

'^{d-yiiTTf))+>'^{d-y^{TTf))+ (47) 

2=1 2=1 

ko 

>J2^d-y^iTTf)) (48) 

2=1 

ko 

>^id-y,{Tr))) (49) 

2=1 

k 

= J2{d-y^{n}))+, (50) 

i=l 

where (|47] | follows from that each {d — yi{TTf))~^ is non¬ 
negative and we sum over i = 1 to kg for some fcp < fc; 
(l48l l follows from removing the projection (•)+ operator; (|49] | 
follows ( |45 T i; and ( l50t follows by the definition of fcp, which 
ensures yi{TT*^) > d for all ko < i < k. By (iSOl l. we get dg. 
Hence, the proof is complete. 

References 

[1] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network infor¬ 
mation flow,” IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1204-1216, 
2000 . 

[2] I. Ahmad and C.-C. Wang, “Locally repairable regenerating code con¬ 
structions,” [Online]. Available: arXiv:[cs.IT]. 

[3] -, “When can helper node selection improve regenerating codes? 

Part I: Graph-based analysis,” [Online]. Available: arXiv: 1604.08231 
[cs.lT]. 

[4] -, “When can helper node selection improve regenerating codes? 

Part II: An explicit exact-repair code construction,” [Online]. Available: 
arXiv:1604.08230 [cs.lT]. 

[5] -, “When and by how much can helper node selection improve re¬ 

generating codes?” in Proc. 52nd Anna. Allerton Conf. Communication, 
Control, and Computing, Monticello, IL, Oct. 2014, pp. 459 - 466. 

[6] V. R. Cadambe, S. A. Jafar, H. Maleki, K. Ramchandran, and C. Suh, 
“Asymptotic interference alignment for optimal repair of mds codes in 
distributed storage,” IEEE Trans. Inf. Theory, vol. 59, no. 5, pp. 2974- 
2987, 2013. 

[7] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ram¬ 
chandran, “Network coding for distributed storage systems,” IEEE Trans. 
Inf. Theory, vol. 56, no. 9, pp. 4539-4551, 2010. 

[8] R. Dougherty, C. Freiling, and K. Zeger, “Insufficiency of linear coding 
in network information flow,” IEEE Trans. Inf. Theory, vol. 51, no. 8, 
pp. 2745-2759, 2005. 

[9] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality 
of codeword symbols,” IEEE Trans. Inf. Theory, vol. 58, no. 11, pp. 
6925-6934, 2012. 

[10] H. D. L. Hollmann, “On the minimum storage overhead of distributed 
storage codes with a given repair locality,” in Proc. IEEE Int. Symp. 
Information Theory (ISIT), Honolulu, HI, Jun. 2014, pp. 1041-1045. 

[11] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with 
local regeneration and erasure correction,” IEEE Trans. Inf. Theory, 
vol. 60, no. 8, pp. 4637^660, 2014. 


20 





[12] G. M. Kamath, N. Silberstein, N. Prakash, A. S. Rawat, V. Lalitha, 
O. O. Koyluoglu, P. Kumar, and S. Vishwanath, “Explicit mbr all-symbol 
locality codes,” in Proc. IEEE Int. Symp. Information Theory (ISIT), 
Istanbul, Turkey, Jul. 2013, pp. 504-508. 

[13] L. Pamies-Juarez, H. D. L. Hollmann, and F. Oggier, “Locally repairable 
codes with multiple repair alternatives,” in Proc. IEEE Int. Symp. 
Information Theory (ISIT), Istanbul, Turkey, Jul. 2013, pp. 892-896. 

[14] D. S. Papailiopoulos and A. G. Dimakis, “Locally repairable codes,” 
IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 5843-5855, 2014. 

[15] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Optimal linear 
codes with a local-error-correction property,” in Proc. IEEE Int. Symp. 
Information Theory (ISIT), Cambridge, MA, Jul. 2012, pp. 2776-2780. 

[16] K. V. Rashmi, N. B. Shah, and P. V. Kumar, “Optimal exact-regenerating 
codes for distributed storage at the msr and mbr points via a product- 
matrix construction,” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 5227- 
5239, 2011. 

[17] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, 
“Optimal locally repairable and secure codes for distributed storage 
systems,” IEEE Trans. Inf Theory, vol. 60, no. 1, pp. 212-236, 2014. 

[18] N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran, “Dis¬ 
tributed storage codes with repair-by-transfer and nonachievability of 
interior points on the storage-bandwidth tradeoff,” IEEE Trans. Inf. 
Theory, vol. 58, no. 3, pp. 1837-1852, 2012. 

[19] -, “Interference alignment in regenerating codes for distributed 

storage: Necessity and code constructions,” IEEE Trans. Inf. Theory, 
vol. 58, no. 4, pp. 213^2158, 2012. 

[20] K. W. Shum and Y. Hu, “Cooperative regenerating codes,” IEEE Trans. 
Inf Theory, vol. 59, no. 11, pp. 7229-7258, 2013. 

[21] C. Tian, “Characterizing the rate region of the (4, 3, 3) exact-repair 
regenerating codes,” Selected Areas in Communications, IEEE Journal 
on, vol. 32, no. 5, pp. 967-975, 2014. 

[22] D. B. West, Introduction to graph theory. Prentice Hall Upper Saddle 
River, NJ., 2001, vol. 2. 

[23] Y. Wu, “Existence and construction of capacity-achieving network codes 
for distributed storage,” IEEE J. Select. Areas Commun., vol. 28, no. 2, 
pp. 277-288, 2010. 


21 



